When you are creating triangle meshes for ray tracing or reusing meshes that have been successfully used in rasterization-based rendering, there are some pitfalls that can cause surprising performance issues.
Some mesh properties that have been acceptable in rasterization can be problematic in ray tracing or require specific handling to work as expected. This post reveals those pitfalls and gives practical advice to get around them.
- Avoid elongated triangles in meshes
- Rebuild deformable meshes when needed
- Be careful with degenerate triangles
- Merge and split meshes judiciously
- Optimize alpha tested meshes
Avoid elongated triangles in meshes
Thin and long, elongated triangles are rarely optimal in computer graphics, but they can be especially challenging for ray tracing acceleration structure building.
The acceleration structures are hierarchies of bounding volumes. The bounding volume of a long, thin mesh triangle contains a lot of empty space around the triangle and easily overlaps with many other triangles in the mesh. When tracing a ray against the mesh and traversing the acceleration structure, this leads to many failed intersection tests before finding the triangle that is hit by the ray (Figure 1).
Elongated triangles in various unaligned orientations amplify the problem as this tends to lead to increasing overlap between the bounding volumes. A specific mesh topology that easily leads to elongated triangles and overlapping bounding volumes is a triangle fan, or more generally, a topology where one vertex is shared by several triangles (Figure 2).
Avoiding the elongated triangles and splitting them into smaller more evenly shaped triangles results in better ray-tracing performance. Triangles that are close to being equilateral have tight bounding volumes with less empty space, which reduces the number of required ray-triangle intersection tests (Figure 3).
Fortunately, the acceleration structure builder used by NVIDIA GPUs can internally mitigate the issues caused by elongated triangles to an extent. This means that the first non-optimal triangle in a mesh doesn’t immediately cause performance issues.
However, depending on the properties of the mesh, at some point ray-tracing performance starts to decrease when the amount of elongated triangles increases. Try to avoid the problematic triangles when creating the meshes. The used structure build flags may also affect the possibilities to mitigate the issue inside the builder. I recommend using only the flags that are needed.
Rebuild deformable meshes when needed
For meshes that deform strongly, doing a structure update can lead to non-optimal ray-tracing performance even when the triangles are not long and thin. In the cheap update operation, the structure is not thoroughly optimized, and the result can be non-optimal if the updated positions are far away from the positions used on last rebuild.
If you see decreasing ray-tracing performance after update operations, try rebuilding the structure more frequently. For extreme cases, consider omitting the ALLOW_UPDATE build flag and doing only rebuilds. You can then try using the PREFER_FAST_BUILD flag and still potentially get faster tracing than with PREFER_FAST_TRACE and updates.
If the application knows that strong deformations happen only in certain situations, it can adapt the rebuild frequency as needed, saving build cost with cheaper updates when possible. Generally, breakable deforming objects—deforming objects where the mesh is separated into pieces potentially detached from each other—are likely to benefit from more frequent rebuilds.
Be careful with degenerate triangles
Triangles that have collapsed into a line or point with zero area are considered degenerate. Deformations where some mesh triangles become degenerate or revive after being degenerate cannot be handled optimally in updates.
Ray-tracing performance against the structure can be reduced after an update with that kind of topology change. Especially problematic is if the degenerate triangles are placed in a position that doesn’t represent their position when they revive. For example, placing all degenerates in origin makes the acceleration structure potentially non-optimal.
However, degenerate triangles are not a problem for structures that don’t use the ALLOW_UPDATE build flag. For those, degenerate triangles can be handled well when building. Also, for structures that support updates, a rebuild instead of update after topology changes improves the structure significantly.
In case the application knows accurately when the topology changes happen, better solutions for updatable structures are to use inactive triangles instead of degenerate triangles or just completely different versions of the mesh. A triangle is considered inactive when positions of its vertices are marked as NaNs. For more information about inactive triangles, see DirectX Raytracing (DXR)
You can handle inactive triangles efficiently when rebuilding an updatable structure. However, the limitation in using inactive triangles is that the state of any triangle must not change between active and inactive in update operations. The application must do a rebuild when such changes happen.
Generally, using degenerate triangles to hide or show selected parts or versions of a mesh is not an optimal solution. Also, using degenerate triangles to hide dead particles in a particle system is not an optimal solution. Depending on what is possible in a specific case, consider other solutions, such as the following
- Using different mesh versions
- Using inactive triangles
- Omitting the ALLOW_UPDATE flag
- Rebuilding updatable structure after topology changes
Merge and split meshes judiciously
Merging meshes or merging several triangle mesh geometries into one bottom level acceleration structure (BLAS) can be beneficial. Tracing a ray against a merged structure is often faster than against the separate pieces. However, due diligence should still be exercised when merging.
Typically, merging makes sense for geometries that are close to each other and do not move relative to each other. It’s especially beneficial when bounding volumes of the geometries overlap a lot. For BLAS instances within top level acceleration structure (TLAS), NVIDIA GPUs use a world space axis-aligned bounding box (AABB) to test whether a ray potentially hits the instance. A good case for merging is parts of the same model with different materials. Geometries merged into one BLAS can have unique materials.
On the other hand, it can be harmful to merge geometries if they are detached from each other and span over a wide area in the scene. Merging such geometries creates an AABB with lots of empty space that easily overlaps with other bounding boxes. Many rays that miss the actual geometries cross the merged bounding box. This leads to unnecessary traversal through the structure as it cannot be skipped by comparing the ray to the box. In such cases, handling the geometries as separate BLAS instances is likely more efficient.
Also, merging geometries that can instance a shared BLAS is not necessarily beneficial. Instancing saves memory, which may result in improved performance too. Though again, significant overlap between instance bounding boxes may still make merging a better choice.
The geometry level of detail mechanisms can affect merging too. For example, the application may have to use the same level of detail for all geometries merged into a single BLAS to avoid rebuilding the entire structure when a new level of detail is needed for an individual geometry.
Generally, a large amount of empty space in the world space AABB of an acceleration structure instance can be a problem. The situation can be improved by either merging or splitting.
- Merging can be beneficial if there are other geometries that share transformation and have overlapping bounding boxes.
- When there are no suitable geometries for merging, splitting the problematic BLAS to pieces can be considered.
- As a rule of thumb, splitting and merging should aim to divide geometry into instances with tight bounding boxes with little overlap (Figures 4 and 5).
In Figure 4, pieces marked with different colors belong to different BLAS instances. For example, it would be better to merge different parts of the table or the chairs into a single BLAS as long as the parts move together.
In Figure 5, all chair cushions with the same material have been merged into a single BLAS. However, the cushions are scattered over the scene. The merged bounding box contains a lot of empty space and overlaps with many other bounding boxes. It would be better to handle the cushions either as individual instances or merged to other parts of the chairs.
Optimize alpha tested meshes
Alpha testing can slow down ray tracing as invoking any hit shader to perform the test interrupts the hardware intersection search. Optimizing the associated any hit shader or limiting the length of the cast rays may help, but the properties of the alpha-tested meshes also play a role. The cost of the alpha testing for a given ray is proportional to the number of processed surface layers along the ray.
The processing of the layers encountered by the ray does not happen in order. On the contrary, the order is unknown, and the processing must continue until the closest accepted hit is resolved. This means that having an opaque layer in front of many alpha-tested layers doesn’t necessarily allow skipping the processing of those layers.
By cutting off fully transparent areas of alpha-tested meshes, you can try to minimize the number of alpha-tested surface layers crossed by rays averagely. This likely means using more triangles, but that can still be a good performance tradeoff. Defining the potentially opaque areas more accurately allows you to simply skip executing unnecessary failed alpha tests.
Also, it helps to split the mesh to alpha-tested and fully opaque parts when possible. Triangles of geometries marked as opaque with the GEOMETRY_FLAG_OPAQUE are significantly faster to process, as any hit shader doesn’t have to be invoked for them. The alpha-tested and opaque geometries can still be merged into the same BLAS.
Triangle bounding volume overlap is typical in some alpha-tested objects, like vegetation, which tends to contain triangles in various unaligned orientations. Also, instance bounding volume overlap can easily happen when placing vegetation in natural formations (Figure 6) These issues can further increase tracing cost against alpha-tested geometry. Using smaller triangles and merging the geometries into fewer BLASes can mitigate these issues, as I discussed earlier.
Whatever granularity you’re looking at your triangles or meshes, remember that when you are tracing rays you are traversing bounding volume hierarchies. To minimize the required steps in the hierarchy and intersection tests, optimize the bounding volumes around your object instances or triangles to be as tight as possible. Minimize empty space. Minimize overlap. Especially, minimize them for anything containing non-opaque geometries.