Games often precompute ambient occlusion (AO) or other static lighting and bake out the results into vertex or texture data that is loaded into OpenGL or DirectX shaders later. Raytracing is the core computation of such a baking pipeline. However, writing a production-quality GPU ray tracer from scratch takes a fair amount of time and expertise.
The NVIDIA OptiX Ray Tracing Engine shortens the development cycle for a GPU baking application. You define the rays and what happens when a ray hits a surface; OptiX does the heavy lifting of intersecting and shading rays in parallel. Baking shadows or occlusion with OptiX is even simpler: you don't need to define any surface shading, you just want to know if a ray hit something (and possibly what and where it hit, in some cases). For these types of pure geometric queries, we have the OptiX Prime API (Prime, for short). You define the rays and triangles, Prime builds a fast TRBVH acceleration structure [1] then shoots rays against triangles and returns the results to you. That's it.
To get you started with Prime, we've posted sample code on github for computing ambient occlusion on the surface of a mesh and baking the results onto vertices. This is part of the nvpro-samples repository, a collection of code for developers in the game and design industries. You're free to use the code directly (it's BSD licensed), and we hope it serves as a starting point for related things like baking onto textures. The sample could also be extended to use the full OptiX SDK if you have geometry other than triangles, or need recursive ray tracing and shading.
We'll go through the high level steps of the algorithm here. Before reading on, you might want to go grab the code from github to follow along. If you're a veteran coder with a baking tool already under your belt, the code may be all you need: the high level steps of the algorithm correspond to blocks of commented code in the main function. The README explains how to build the sample on Linux or Windows if you want to do that up front too; you'll need CMake, the OptiX binaries, and a few other dependencies depending on your platform.
For reference in case you're not following along with the sample, here are the current command line options:
App options:
-h | --help Print this usage message
-f | --file <scene_file> Specify model to be rendered (obj, bk3d, bk3d.gz, csf, csf.gz).
-o | --outfile <vertex_ao_file> Specify raw file where per-instance ao vertices are stored (very basic fileformat).
-i | --instances <n> Number of instances per mesh (default 1). For testing.
-r | --rays <n> Number of rays per sample point for gather (default 64)
-s | --samples <n> Number of sample points on mesh (default 3 per face; any extra samples are based on area)
-t | --samples_per_face <n> Minimum number of samples per face (default 3)
-d | --ray_distance_scale <s> Distance offset scale for ray from face: ray offset = maximum scene extent * s. (default 0.01)
-m | --hit_distance_scale <s> Maximum hit distance to contribute: max distance = maximum scene extent * s. (default 10)
-g | --ground_setup <axis> <s> <o> Ground plane setup: axis(int 0,1,2,3,4,5 = +x,+y,+z,-x,-y,-z) scale(float) offset(float). (default 1 100 0.03)
--no_ground_plane Disable virtual ground plane
--no_viewer Disable OpenGL viewer
-w | --regularization_weight <w> Regularization weight for least squares, positive range. (default 0.1)
--no_least_squares Disable least squares filtering
Viewer keys:
e Draw mesh edges on/off
f Frame scene
q Quit
Algorithm Overview
- Load the scene. Loaders are provided for OBJ, Bak3d and CSF, a basic CAD scene file format used in various nvpro samples. The OBJ loader flattens all groups into a single mesh. The Bak3d and CSF loaders preserve separate meshes, each with their own transform. Use the utilities in the Bak3d repo to convert other formats, or write a new loader for your favorite format and add it to the "loaders" subdirectory.
- Generate AO sample points all over each mesh. We'll flesh this and other steps out more below.
- Evaluate AO samples by shooting a bunch of rays from each one.
- Map AO to vertices and optionally save a file.
- Visualize results in a standard OpenGL viewer as a vertex attribute.
Now for the details. In the sample code, each of the steps below is implemented in its own .cpp file and exposed through the bake_api.h header.
We'll use the Rocket Sled scene as our main example; by default this is downloaded in Bak3d format automatically when you first configure CMake.
The scene has about 418K triangles, with some variation in triangle size and shape depending on the curvature of the surface, which will make the "Map AO to vertices" step a little more interesting. There are 109 separate meshes in this scene, and to make the example more general we leave these as separate instances, each with their own transform, rather than flattening into a single mesh. Here's an exploded view showing the different meshes:
Generate AO Sample Points
The first step is to cover the surface of every mesh in the scene with sample points. We'll be shooting rays from these later. We place a minimum number of points per triangle (default 3), to ensure that every triangle gets some points. In many scenes this is good enough, especially if the triangles are all about the same size and shape, so this is the default behavior. For scenes with large triangles, or triangles with very different sizes, there are command line options to place more points using either a fixed number of points per triangle, or according to the area of the triangle.
Here are some of the default points (3 per triangle) for the Rocket Sled:
By the way, if you're interested in the math, the PBRT book, 2nd edition, section 13.6.4, derives how to pick random barycentric coordinates on a triangle. Given two random numbers r1, r2 in the unit square, the mapping turns out to be this:
u = 1 - sqrt(r1)
v = r2 * sqrt(r1)
w = 1 - u - v
Evaluate AO Samples
To estimate the ambient occlusion value at a sample point, we shoot a bunch of shadow rays. More technically, we shoot shadow rays over the upper hemisphere, defined by the surface normal, using a cosine distribution, and average the results into a scalar occlusion value. See the Wikipedia entry on Ambient Occlusion for reference. Since the total number of rays is another command line option and could possibly be large enough to exceed the available GPU memory, we first split the set of sample points into batches, hard coded to 2M points. The set of rays per point is further split into passes, which are run on the device one at a time. Each pass generates a single ray per sample point. Here's an example showing two passes on a single triangle:
Visualization of rays for two passes of a 64-pass solution. Ray directions are jittered but coherent within a pass. Ray origins are biased slightly to avoid self intersection.
Users can adjust the ray bias and length using command line options. In the sample these options are "ray_distance_scale" and "hit_distance_scale", both in units of scene scale, not world scale (we want the same numbers to work for different scenes).
A performance note: shooting shadow rays backward (starting outside the scene and shooting the ray back toward a sample point) often increases performance: about 20% on the Rocket Sled. Since "forward" shadow rays start near a sample point on a triangle, there are extra ray-box and ray-triangle tests near the origin that usually don't result in a hit (after all, we biased the ray off the surface a little to avoid exactly that self intersection). The backwards ray has at least a chance of hitting something else first, stopping early, and avoiding the extra work. The downside is reduced precision in the bias, since we're subtracting a small bias from the "tmax" value of the ray (a larger number). The loss of precision was not visually noticeable for this scene with default settings, but you can switch to forward rays if you like by changing a few lines of code in the sample and rebuilding.
Instances
We also support instances, which are pairs of meshes and transforms. A mesh referenced by multiple instances is only stored once in memory. Note: Some file formats (Bak3d, CSF) support transforms; the OBJ format does not, so when loading from OBJ we flatten all groups into a single mesh.
In practice instancing is useful for things like vegetation or cities that can be decomposed into basic building blocks of geometry during modeling. The sample has a command line option, intended for debugging, to instance every mesh in the scene some number of times with a procedural transform. An example is shown below. The sample points and the rays they spawn are still unique per instance, to avoid any subtle repeated noise patterns in the results.
Note that for performance reasons, OptiX Prime supports at most one level of instancing: each instance references a mesh, not another instance. The full OptiX API supports multi-level instancing.
Blockers
If you've been looking at the AO images carefully, you might have guessed that we added an invisible ground plane marked as a blocker to most scenes. Blockers are just that -- they occlude rays but do not receive sample points of their own. Blockers help integrate objects that are baked separately in a production pipeline and keep objects from looking like they're floating in space. This is perhaps easiest to see on character meshes like this one [3]:
Map AO to Vertices
The last step of our baking pipeline is to write out the occlusion values in a format easily consumed later during rendering. The following methods are all common in film and game rendering:
- Point clouds or 3d textures (.ptc or .bkm files in Renderman, for example)
- 2d textures
- Vertex attributes
Point clouds and textures retain the detail from the original sample points, but require more memory for storage. Point clouds are also less efficient to access in hardware. Exporters for both these formats could be built on top of the current sample code. Currently, however, we only write to vertex attributes. This requires mapping occlusion values from the sample points onto the vertices.
This resampling problem is easier to think about in lower dimensions. Instead of samples on a triangle, consider some samples along a line segment parameterized by t:
t AO
-----------
0.125 0.6
0.375 0.5
0.625 0.7
0.875 1.0
We want to somehow accumulate these 4 samples onto the endpoints ("vertices") of the segment, which are at t=0 and t=1. One simple method is to weight each sample by its t-value along the segment, and also by the area of the sample, which is dA = 0.25 since the 4 samples are evenly spaced. So, the first sample would accumulate (1 - 0.125)*(0.6)*(dA) onto the left vertex and (0.125)*(0.6)*(dA) onto the right vertex. Repeating this for the other 3 samples and then normalizing by the total weights produces final vertex values that, when interpolated, give this line:
Sample points (black dots) and interpolation (green line) of the weighted average at the two endpoints
The weighted average method has the advantage of being simple, and it works ok for scenes with very uniform, dense triangles. It also extends easily to 3d, where the parametric distance along a line becomes a barycentric distance along a triangle.
Another approach is to think of this as a fitting problem, and look for a least squares solution. In other words, if the original samples look kind of like a line (or a plane in 3d), and you're going to approximate them using linear interpolation from vertices to fragments in an OpenGL shader, then find the linear parameters that give the best fit:
Least squares fit (blue line) to the sample points
This solution matches the slope of the samples better. It also goes outside the range of the original AO values a little.
In a research paper called "Least Squares Vertex Baking" (LSVB) [2], Kavan et al first applied least squares fitting to occlusion baking. The math becomes more complicated on the surface of a mesh, and more computationally demanding; you need a numerical linear algebra library like Eigen to solve it efficiently. Check the paper for more details and examples.
One important ingredient of LSVB in practice is a regularization weight parameter, which helps keep the solution smooth in areas where the mesh is smooth. Without regularization, the pure least squares solution may not always fit our intuition about what smooth AO should look like. Regularization is also good at smoothing away noise in cases where there are not enough rays and/or sample points.
LSVB is not critical for the Rocket Sled if we use a lot of sample points and rays, but it fixes some artifacts like the ones in the vertical shadow in the middle of this image:
Result with 5M sample points and 256 rays per point. Left: weighted average. Right: least squares.
It's also interesting to apply LSVB to a slightly undersampled result using many fewer rays and samples:
Result with 1.2M sample points and 64 rays per point. Left: weighted average shows artifacts from insufficient numbers of samples and rays. Right: least squares smooths away noise
In the CMake setup for the sample, if the Eigen library is found, then we build code for least squares fitting and use it by default. Pass the "no_least_squares" flag to use the faster weighted average method.
Output Format
In addition to direct visualization, we optionally write the per-instance vertex-AO values to a binary file (the "outfile" option), using a format straight from the code. Adapt this to your pipeline.
{
uint64_t num_instances;
uint64_t num_vertices;
struct Instance {
uint64_t storage_identifier; // set by loader
uint64_t offset_vertices; // at which index to start in vertex array
uint64_t num_vertices; // how many ao vertex values used by instance
} instances[num_instances];
float ao_values[num_vertices]; // one value per vertex
}
Performance
Timings for the Rocket Sled scene on an NVIDIA Quadro M6000 GPU are shown below. Total baking time from start to finish is under 2 seconds, and the performance of just the ray query ("accum query" line) is about 124 Mrays/sec on this particular scene, for secondary rays. (Performance may vary on other scenes). The least squares filtering step is not currently GPU accelerated, although we're interested in this as future work. Right now we thread the filtering step over the 109 meshes composing the sled using OpenMP.
../../bin_x64/optix_prime_baking -w 1
Load scene ... 96.52 ms
Loaded scene: sled_v134.bk3d.gz
109 meshes, 109 instances
uninstanced vertices: 348015
uninstanced triangles: 418036
Minimum samples per face: 3
Generate sample points ...
38.11 ms
Total samples: 1254108
Rays per sample: 64
Total rays: 80262912
Compute AO ... scene scale = 13.55
setup ... 502.66 ms
accum raygen ... 0.20 ms
accum query ... 644.05 ms
accum update AO ... 0.36 ms
copy AO out ... 0.51 ms
1160.71 ms
Map AO to vertices ...
build mass matrices ... 108.82 ms
build regularization matrices ... 203.75 ms
decompose matrices ... 166.96 ms
solve linear systems ... 13.69 ms
315.44 ms
Last Word
That concludes our quick tour of baking with Prime, thanks for reading! If you start working on an application and find yourself wanting more support, download the full OptiX SDK -- it's free -- and start posting on the OptiX forums or email lists. The SDK includes more introductory Prime samples as well as full OptiX API samples for interactive raytracing.
References
- Fast parallel construction of high-quality bounding volume hierarchies, T. Karras and T. Aila, HPG 2013
- Least Squares Vertex Baking. L. Kavan, A.W. Bargteil, and P.-P. Sloan, EGSR 2011.
- NVIDIA Hunter mesh credits: Adam Pintek, David Wright, Jason Walker, Mike Jensen.