IMGUI

Immediate Mode Graphical User Interface (IMGUI): User Interfaces in a Real-Time Loop

Background

Dreamler has always been intended to be a real-time native application, with 60 frames per second hardware accelerated graphics and real-time networking. It is technically very much the same as a typical modern PC-game. We started off with DirectX 9.0c on Windows, and have since migrated to OpenGL 3.x in order to support both Mac OSX as well as Linux.

As the intent was always there to include arbitrary animations and cool visuals, we needed to approach all aspects of the user interface from the “real-time” angle. I personally have a lot of experience writing more traditional user interface applications on Windows (using Microsoft Foundation Classes), but have since then come across a new and somewhat experimental approach called Immediate Mode User Interface (IMGUI).

IMGUI is an alternative to the more traditional object oriented RMGUI (Retained Mode Graphical User Interface), with the main difference being that little or no application state is “retained” in the user interface components. RMGUI typically models the various widgets a user might interact with as instances of classes (objects), while IMGUI reduces “widgets” to procedural function calls. Such a function call will typically directly poll input devices as well as draw something on the screen.

Pros

There are several gains to the IMGUI approach:

  • There is no setup phase in which a programmer “creates widgets”. Instead the “existence” of any given “widget” is simply a function call. If no function is called, no widget appears on screen.
  • There are no “button click callbacks” as are typically found in RMGUI. Typically IMGUI “widget” function calls return interaction information directly, for example:

if(imgui::Interaction::LEFT_MOUSE_BUTTON_CLICK == imgui.button(style, position, text))
{
//do something in the application…
}

  • There is no state synchronization between the user interface and the application. Typically in RMGUI you need to explicitly sync widget state with application state in order for the user interface to accurately reflect the application state. This is indeed the “retention” that we speak of.

All of this allows for very dynamic user interfaces as well as rapid iteration on ideas since there is far less code to write than in the Retained Mode equivalent. Dreamler is all about novel and custom visualizations and interactions, and being able to iterate through various schemes with little to no setup overhead has been very benificial to keeping our rate of development high.

Cons

On the downside, IMGUI has the caveat of basically requiring real-time refresh rates (60hz is typical) in order to feel responsive and to not miss clicks / keypresses. The user interface can not so much be said to “exist” but is rather “calculated” or “rendered” as a result of a real-time loop that continually iterates over application state and draws “widgets” and reacts to interactions to same.

Indeed it might be hard to envision interacting with something that doesn’t “exist” in the classic sense of existing in RAM with some kind of bounding information. You can’t very well click a button that you don’t see. As a result of this it is typical for IMGUI to have at least a single frame of input latency, as the user is typically clicking on a button they “saw last frame”.

This can result in what I call “frame shearing”, analogous to what is typically seen on a computer monitor when vertical sync is disabled. If an interaction with an IMGUI “widget” call results in a change to application state which affects the rendering of the user interface further down the line, a frame will be generated that may well contain portions from two different application states. However, due to the fact that the user interface is being recalculated / redrawn at 60 frames per second, frame shearing often does not have any practical impact beyond at worst a “flicker” when the sheared frame is rendered.

It is possible to completely get rid of the effects of frame-shearing by having mechanisms in place to detect the shear and re-start the evaluation / rendering process for the entire user interface before it is output on the screen. In practice this results in a system where the user interface can be generated most often once but in the shearing cases twice per frame, as a user typically only can interact with a single widget during a single frame.

Implications for the GPU

IMGUI really only works due to the fact that GPUs are ubiquituous and fast these days. You need to be able to fill hi resolution displays at a 60hz refresh. One does however need to be aware that typical IMGUI implementations aren’t really the best way to make use of GPU horsepower (beyond fill-rate) as IMGUIs tend to be implemented by generating lots of dynamic geometry that needs to be tossed across the bus from main memory to the GPU each frame.

GPUs like big (millions of triangles) meshes of static geometry that are cached in GPU memory, and for the CPU to ask for the rendering of a relatively limited number of instances / batches of this geometry (around 2500) for any given frame. This isn’t really what our IMGUI is doing, but rather it is generating hundreds of thousands of quads (2 triangles) dynamically in main memory and then throwing them at the GPU in a very tiny number of batches (10-20) per frame. A typical quad might be part of a button or a single glyph from our font system.

This is really too bad, as we are effectively bound by the bus transfer rate. In tests on typical GPUs I have yet to exceed 128k generate quads in this manner while still retaining 60hz. Granted, I am not a graphics specialist, but I have yet to come up with a better approach to satisfying the requirements of a typical IMGUI.

Speculative approaches might include allowing for application level construction of “higher order primitives” like a pre-baked and GPU-cached mesh that represents an Activity or Link or perhaps even the entire gamebard. Indeed, it would be relatively trivial to cache the output of any given frame (all graphics) on the GPU and re-use them across frames. This would however need to be re-calculated every time “something changes”, which in itself might be hard to figure out due to the very dynamic nature of the IMGUI itself, but for the most part this would probably severely improve performance.

If one were to switch input handling (as wel as networking) to drive the entire evaluation on an event-based premise, then the GPU-caching of an entire frame of graphics would be pretty clean to build. Any mouse moves, mouse clicks, key presses, or network inputs would trigger the re-evaluation of the entire user interface and re-generate any GPU-caches of the graphics. Observer however that one would need to include any local animations as input drivers as well, and if these causes changes every frame one would be back at square one (with perhaps an expensive re-build of GPU cached meshes on top of everything else).

The “draw call boundary” of around 2500 batches is a real problem, at least on legacy DirectX9.0c. I have no experience with newer versions of DirectX, nor do I know if this limitation is as sever on OpenGL drivers. One could envision a system where each individually moveable piece of the gameboard (like an Activity) was rendered in a single batch (mesh, transform, shader, textures), but it seems that we might quickly run into the draw call boundary. Various instancing techniques (streams of transformrs) might help here.

Text would probably have to be pre-cached on a pre-string basis and re-used; this would probably be a big win as currently string construction and linebreaking is pretty expensive. Given that text changes infrequently this might be a worthwhile optimization, but would result in a huge number of unique string meshes on the GPU.

If there is access to newer GPUs with tesselation hardware one might be able to completely generate all relevant Activity shapes, Link shapes, as well as comlete text string meshes, on the GPU. I speculate that the actual per-frame CPU to GPU bus traffic would be simple data lists that could represent Activity positions, Link endpoints, and text positions (and strings somehow) to be amplified into actual meshes by the hardware.

Bottom Line Tradeoff 

Both IMGUI itself as well as the resulting relatively inefficient use of GPU resources that currently is the case is a tradeoff. It is a simple system that allows for maximum iteration and revision of the actual look of Dreamler without having to implement and revise specific cache systems in order to gain performance. It also has the benefit of not requiring extravagant hardware (like a tesselation solution would).

I would personally wait until Dreamler solidifies more before looking at performance issues with fresh eyes and maximum real world usage information in order to construct an effecient rendering pipeline.

2 5612