Latest NVIDIA OptiX Renders Ray Tracing Faster Than Ever Before

NVIDIA OptiX Ray Tracing Engine is a scalable and seamless framework that offers optimal ray tracing performance on GPUs. In this fall update to the NVIDIA OptiX SDK, developers will be able to leverage new compilation techniques, and superior layered and temporal denoising to handle more ray tracing workloads, quicker.

Faster compile times

NVIDIA OptiX 7.4 comes with a new feature to support the parallel compilation of OptixModule objects containing multiple functions. The work is expressed as task objects returned from the API that can be executed concurrently to achieve parallelism. Additional tasks are returned when opportunities for parallelism are found. Threading is handled outside of NVIDIA OptiX by execution of the tasks from application-managed threads. The application can more easily integrate the parallel work into existing job schedulers.

Figure 1. Deserted house in the night, rendered In RedShift. Image courtesy of Daz3D.

Improved customization and performance

NVIDIA OptiX 7.4 has increased the size of the ray payload from 8 to 32 registers. The payload registers are the mechanism NVIDIA OptiX offers for passing arbitrary data along with a ray, from the caller that traces a ray, and back. This payload mechanism is very lightweight, similar to passing arguments to a function. The limited size of the payload forced many to use local or even global memory buffers for passing ray data around, which can have a negative impact on performance.

The new payload size provides a greater variety of applications to be able to take advantage of using only registers for passing data. Using more registers increases register pressure and can result in spilling to memory, so the new NVIDIA OptiX 7.4 payload increase also comes with new API functions. This will help developers optimize their payload usage. The new API allows you to declare how you use payload values so that the compiler can have opportunities to reuse registers whenever possible.

Curves additions and optimizations

We also introduce a new curve primitive: the Catmull-Rom cubic curve. This type of curve is a popular style of interpolating curve that passes directly through its control points, for fine-tuned placement of the curves when precision control is desired. Catmull-Rom curves are a popular choice in the film and games industries for hair and fur and other uses of curves. The other curve types that NVIDIA OptiX already supports are the cubic and quadratic B-spline curve, and the linear curve. The B-spline is an approximating curve that is a little smoother than the Catmull-Rom curve, but also does not generally pass directly through its control points.

In addition to the new curve primitive, NVIDIA OptiX has added an option for controlling whether cubic and quadratic curves will have open or closed ends. Open-ended curves can be desirable for avoiding shader divergence having to do with special-case handling of endcap normals. NVIDIA OptiX curves are currently back-face culled, so rays that enter through the open end of a curve will miss the curve completely. Open-ended curves are common and desirable and also better for performance. NVIDIA OptiX 7.4 changed the default endcap behavior of cubic and quadratic curves to be open-ended.

Previously, these curves had flat closed disc-shaped end-caps, which are sometimes useful for applications that require careful control of ray-curve behavior, or to prevent rays from passing through curves. Both the B-spline and the Catmull-Rom curves share control points from segment to segment along connected strands, in order to save memory.

Speaking of saving memory, NVIDIA OptiX 7.4 has enabled adaptive sampling for curves, which by default both reduces memory and improves performance. Memory usage can be critical for furry creatures that have millions of curve strands. For the most performance-minded with memory to spare, curves can be rendered much faster by using the build flag OPTIX_BUILD_FLAG_PREFER_FAST_TRACE. The adaptive sampling uses slightly more memory than before (around 10%), in exchange for a rendering performance boost of on average by 25%-50%, and even by as much as 70% in some of our tests.

Two major denoiser features come together

The denoiser in NVIDIA OptiX 7.4 continues to improve in quality and speed. With this version of NVIDIA OptiX developers can combine the previous two major denoiser feature upgrades: AOV (or layered) denoising, and temporal denoising. AOV denoising is a feature for denoising multiple arbitrary render layers at the same time, and offers much better efficiency compared to denoising each layer separately. Also, denoising multiple layers at the same time can give significant quality improvement, as the denoising filter choices are kept consistent for all the layers. This means fewer denoising artifacts will be visible once you composite the layers back together.

Denoising separate layers can result in different filter choices being made independently for each layer, and can make certain layers more prone to visible artifacts. This is especially true for very sparse layers such as a specular pass that is mostly empty. Temporal denoising is a new feature from NVIDIA OptiX 7.3 that keeps denoising coherence from frame to frame in an animated sequence of frames. If you denoise frames individually, they might look great individually, but when you play them animated, it’s common to get flickering artifacts because the denoiser makes independent choices on every frame. Temporal AOV denoising gives you both features in one package—coherence in denoising across different layers and across animated sequences.

Increased scale with demand loading

The demand loading library in NVIDIA OptiX is being released with a cache eviction feature, for dynamic replacement of cached tiles in order to save even more memory. The team is seeking feedback on this new feature in the demand loading library, so if you use it please reach out through our NVIDIA forums here or by e-mail and let the NVIDIA OptiX team know how well it’s working for you. Bringing the demand loading library eviction feature to a production-ready state, along with other improvements are planned for future versions of NVIDIA OptiX.