As graphical requirements evolve to include more polygons, more clutter, and better lighting to create more realistic-looking scenes, we are forced to produce more art to meet those requirements. One technique that helps to conserve development time is an instancing approach. For example, a single chair is modeled and placed all around the game world (see Figure 4-1). Instancing helps immensely to shrink development time. It is, however, all too common for modern games to become limited by the number of draw calls (Wloka 2003). If each instance used in a scene leads to one more batch, it may not be possible to realize the benefits of instancing: batches become the bottleneck. Segment buffering is a technique that collects multiple instances that are close to each other in the scene and merges them into "über-instances," thus reducing the number of batches and providing a simple and elegant solution to the batch bottleneck.
Figure 4-1 A Scene with Many Instances of the Same Object
4.1 The Problem Space
Figure 4-1 shows a scene containing many static objects of the same material. Most GPUs cannot efficiently render scenes like this for more than a thousand instances; each instance potentially requires a render-state change, such as transform changes, light map texture changes, or vertex stream changes. These render-state changes cause driver and thus CPU overhead, ultimately resulting in poor performance. Although we could work around this problem on the content side by merging multiple models to create new models to instance around the world (thereby reducing the total number of instances), such an approach would require making a fair amount of custom art and could cause the maps to look repetitious.
4.2 The Solution
Segment buffering automates the merging of similar instances while maintaining most of the benefits of rendering separate instances. The primary benefits of segment buffering are thus a nonrepetitious look and the ability to not draw some of the original instances, as if they were removed from the visibility set.
4.3 The Method
Let's assume that we have a list of instances of a specific model. The next step is to spatially organize the instances such that in a 1D array, objects that are near each other spatially are also near each other in the array. With this new, organized list, the final step is to stitch all of the individual instances together as if they were one big instance, recording which parts in the new vertex/index buffers belong to the individual instances. We call each record in this list a segment. To render objects inside the segment buffer, simply generate a list of segments to be rendered while merging sequential segments in the list. This procedure thus generates a new list of optimally merged sections of the vertex/index buffer to render. Segments are rendered as static batches (see Chapter 3 of this book, "Inside Geometry Instancing").
4.3.1 Segment Buffering, Step 1
The key to making segment buffering work and work well is in the spatial organization of the segments inside the segment buffer. For this, we can use any spatial organization structure. K-d trees (Samet 1989, 1990), where k is 3, perform well for 2015's game Men of Valor: Vietnam. Octrees (Suter 1999) or any other spatial data structures would probably work just as well.
The tree is generated from a list of points that represent the locations of the instances for which we are creating a segment buffer. Once we have this tree, we do a depth-first traversal, always branching left first at every interior tree node. When we hit a leaf, we add all the points contained in the leaf to the new ordered list. When the traversal is complete, we have a spatially ordered list of instances.
The segment buffers and the spatially ordered list of instances need to be generated only when a new list of instances is introduced. In the majority of cases, then, the segment buffers and ordered list of instances need to be created only at load time.
4.3.2 Segment Buffering, Step 2
Given a spatially ordered list of instances, we iterate through the list and construct a single vertex/index buffer containing all of the instances. We make sure to transform each instance's vertex buffer components into world space before the instance is written into the big vertex buffer. Also, we need to record what parts of the big vertex/index buffer belong to each instance, for later reference during rendering. Again, this big vertex/index buffer needs to be created only in the event a new list of instances is introduced, typically at load time.
4.3.3 Segment Buffering, Step 3
When rendering, instead of immediately drawing an instance that is segmented, we generate a list of all these instances that need to be rendered. Instances outside the view frustum, for example, need not be rendered and should be omitted. Then we translate this list of instances into the parts of the big vertex/index buffer we previously generated for segment buffering. While doing this, we can merge segments adjacent to one another in the big vertex/index buffer to achieve an optimal list of the pieces of those buffers to be rendered. The result is the exact same output as if the instance list was rendered individually without segment buffering.
4.4 Improving the Technique
To make segment buffering a viable technique for a game that uses light mapping extensively, we need to implement automatic texture-atlas generation (NVIDIA 2004). This is because we cannot segment instances together that require render-state changes in between the rendering of each instance. In scenes where every instance has its own light map, a texture change is required between the rendering of each instance. Automatic texture-atlas generation is fairly easy to implement in the context of segment buffering because every instance inside the big vertex buffer has a unique section. Thus, the texture coordinates can be directly modified without affecting any other instance. This is in contrast to the traditional method of modifying the texture coordinates inside a vertex or pixel shader.
In this chapter, we have described a technique to significantly reduce the number of batches rendered in a single displayed frame. In doing so, the technique enables us to create a much richer and more realistic-looking environment. In addition to segment buffering, we described enhancements that make it applicable in situations where it normally would not be appropriate.