GPU Gems: Part II - Lighting and Shadows

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Part II: Lighting and Shadows

Part II: Lighting and Shadows

From surface shaders that determine how surface parameters and scene parameters are combined to produce color, to algorithms that organize scene objects in an efficient manner, the choice of a lighting and shadowing algorithm often has the single greatest impact on the design of your rendering engine. Choosing an algorithm influences more than just the look of your scenes: it affects the way content is authored and how complex and interactive your scenes can be. The chapters in this part of the book describe various algorithms for lighting and shadowing, along with techniques for making these algorithms more efficient and robust.

In Chapter 9, "Effective Shadow Volume Rendering," Morgan McGuire thoroughly covers the popular stencil shadow volume technique for rendering real-time shadows. Stencil shadow volumes, although often simple to implement initially, are notoriously difficult to make robust and fast. This chapter focuses on getting the corner cases right and reducing the geometry and fill-rate costs of the technique.

Fabio Pellacini and Kiril Vidimce, in Chapter 10, "Cinematic Lighting," present a general lighting shader based on a shader used by Pixar Animation Studios but simplified for real-time lighting. This uberlight shader, as it is known, was written with the fundamental goal of giving control over as many lighting parameters as possible to the artist lighting the scene.

One of the most popular general real-time lighting algorithms today is shadow maps. A major issue that arises when using shadow maps is aliasing. In Chapter 11, "Shadow Map Antialiasing," Mike Bunnell and Fabio Pellacini describe how to reduce shadow map aliasing efficiently through percentage-closer filtering.

Chapter 12, "Omnidirectional Shadow Mapping" by Philipp S. Gerasimov, extends the shadow map idea to correctly handle omnidirectional (point) light sources. Implementation details, including fallbacks depending on hardware capabilities, are included.

Most shadows in real-time games are hard-edged and aliased, due to their being approximated as simple point lights without area. In the real world, all lights have nonzero area, and therefore all shadows have varying degrees of softness. In Chapter 13, "Generating Soft Shadows Using Occlusion Interval Maps," Will Donnelly and Joe Demers introduce a new technique for accurately rendering soft shadows in static scenes with lights that move along predetermined paths. This technique was used in the NVIDIA GeForce FX 5900 launch demo, "Last Chance Gas."

Simon Kozlov continues the antialiasing crusade in Chapter 14, "Perspective Shadow Maps: Care and Feeding." He presents new ideas on optimizing perspective shadow maps, a new kind of shadow map introduced by Stamminger and Drettakis at SIGGRAPH 2002. Perspective shadow maps strive to reduce or eliminate shadow map aliasing artifacts by maximizing shadow map texel density for objects that are projected to large pixel areas.

Finally, in Chapter 15, "Managing Visibility for Per-Pixel Lighting," John O'Rorke observes that techniques that increase visual complexity also tend to increase the number of batches being sent to the hardware—a crucial metric to minimize if you want to get the best performance out of modern GPUs. This chapter uses a number of visibility techniques to find an optimal set of batches to submit, resulting in large performance gains. The techniques have the nice side effect of reducing both CPU load and GPU load.

Cem Cebenoyan, NVIDIA

GPU Gems: Chapter 9. Efficient Shadow Volume Rendering

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Chapter 9. Efficient Shadow Volume Rendering

Morgan McGuire
Brown University

9.1 Introduction

A security guard's shadow precedes him into a vault—enough advance warning to let the thief hide on the ceiling. Ready to pounce on an unwary space marine, the alien predator clings to a wall, concealed in the shadow of a nearby gun turret. Yellow and red shadows of ancient marbled gods flicker on the walls of a tomb when the knight's torch and the druid's staff illuminate the statues inside. These are just a few vivid examples of how real-time shadows are used today in gaming.

Real-time shadows are now required for new 3D games. Gamers are accustomed to the perceptual, strategic, and cinematic benefits of realistic lighting. Unlike other effects, shadows aren't rendered objects. Instead, they are areas of the screen that are darker than others because they receive less light during illumination calculations. The hard part of adding shadows to a rendering engine is finding those areas in real time. This chapter describes how to use shadow volumes, the shadowing method used in games such as id Software's Doom 3, to mark shadowed pixels in the stencil buffer. See Figure 9-1. Once each pixel is classified as shadowed or illuminated, it's simple to modify the pixel program responsible for lighting in order to zero out the illumination contribution at shadowed pixels.

fig09-01.jpg

Figure 9-1 A Scene from id Software's

9.1.1 Where to Use Shadow Volumes

The shadow volume technique creates sharp, per-pixel accurate shadows from point, spot, and directional lights. A single object can be lit by multiple lights, and the lights can have arbitrary colors and attenuation. The shadows are cast from triangle meshes onto whatever is in the depth buffer. This means that the objects being shadowed can be meshes, billboards, particle systems, or even prerendered scenes with depth buffers.

Compared to other algorithms, shadow volumes can handle many difficult-to-shadow scenes well. Figure 9-2 shows one such problematic scene. The only light source is a point light inside the jack-o'-lantern. The entire scene is in shadow except for the triangular patches of ground illuminated by light that shines out through the holes in the pumpkin. This is a hard case for several reasons. It inverts our usual assumption that most of the scene is lit and shadows are small—rarely do shadows enclose the entire scene. The lit areas are very large compared to the holes in the pumpkin that create them. Although light shines out through only the front and the bottom, the light is omnidirectional and shadows must be considered from all angles. Finally, the shadow caster is more than close to the light source: it surrounds it.

fig09-02.jpg

Figure 9-2 A Difficult Scene for Shadows: Light Inside a Jack-o'-Lantern

Shadow volumes are not ideal for all scenes. The technique involves constructing a 3D volume that encloses all shadows cast by an object. This volume is constructed from the shadow caster's mesh; however, some shadow casters do not have a mesh that accurately represents their shape. Examples include a billboard, a particle system, or a mesh textured with an alpha matte (such as a tree leaf). These casters produce shadows based on their actual meshes, which do not match how the objects really appear. For example, a billboard smoke cloud casts a rectangular shadow.

Another problem object is a mesh containing edges that have only a single adjacent face, commonly known as a crack. In the real world, if you look into a crack, you see the inside of the object. Of course, in a rendering engine, you'll see through the object and out the other side because the inside is lined with back-facing polygons culled during rendering. This object is nonsensical as a shadow caster. From some angles, it casts a solid shadow; from other angles, light peeks through the hole and shines out the other side. Even worse, an optimization for the shadow volume breaks when using these objects, creating a large streak of darkness hanging in empty space, as shown in Figure 9-3.

fig09-03.jpg

Figure 9-3 Cracks in a Model Let Shadows "Leak" Through the Air

Another potential limitation of the approach is that it requires that everything in a scene cast shadows. When a character's shadow is cast on a wall, it is also cast on everything behind the wall. The only reason the viewer doesn't see the shadow on the other side of the wall is that the wall casts its own shadow that overlaps it. If you cast shadows from characters but not from scene geometry, the shadows appear to go through solid objects.

The ideal scene for shadow volume performance is a top view, such as those found in many real-time strategy, sports, and above-ground adventure games. Such a scene is lit from a few downward-pointing directional lights, and the camera is above all the objects, looking down at the ground. The worst case for performance is a scene with multiple point lights in the middle of a large number of shadow-casting objects—such as a large party of torch-wielding adventurers in an underground room with pillars.

9.2 Program Structure

The shadow volume technique consists of two parts: constructing the volumes from silhouette edges and rendering them into the stencil buffer. These parts are repeated for each light source, and the resulting images are added together to create a final frame (a process called multipass rendering). The basic algorithm is easy to understand and implement, but it is slow for big scenes. To address this, a series of optimizations reduces the geometry-processing and fill-rate requirements.

We begin with a high-level view of the program structure. We follow up with a detailed discussion of each step, and then we look at several optimizations. Finally, we peek into the future by examining several research projects on shadow volumes.

9.2.1 Multipass Rendering

Mathematically, the illumination at a point is the sum of several terms. We see this in the Phong illumination equation for a single light, which is the sum of ambient, emissive (internal glow), diffuse, and specular components. A scene with multiple lights has a single ambient term and a single emissive term, but it has one diffuse term and one specular term for each light. When rendering without shadows, multiple lights can all be rendered in a single pass. This is typically done by enabling multiple hardware light sources or implementing a pixel shader with code for each light.

When rendering with shadows, the contribution from a given light is zero at some points because those points are shadowed. To account for this, the diffuse and specular contribution from each light is computed in a separate rendering pass. The final image is the sum of an initial pass that computes ambient and emissive illumination and the individual lighting passes. Because the initial pass writes depth values into the z-buffer, the additional passes have zero overdraw and can be substantially cheaper in terms of fill rate. Objects rendered in the additional passes are also good candidates for occlusion culling.

Although shadow volumes do not create the soft shadows cast by area sources, multiple passes can be exploited to create a similar effect by distributing multiple, dim spotlights over the surface of an area light. Unfortunately, for a complex scene having enough lights to make this look good, this method is too slow to be practical. (A new research technique, described in Assarsson et al. 2003, suggests a more efficient way of rendering soft shadows with shadow volumes.)

The individual lighting passes are combined using alpha blending. To do this, render the ambient/emissive pass to the back buffer with depth writing enabled and the blending function set to glBlendFunc(GL_ONE, GL_ZERO). This initializes the depth buffer and creates the base illumination.

Then for the light passes, disable depth writing and change the blending function to glBlendFunc(GL_ONE, GL_ONE). This blending mode adds newly rendered pixels to the ones already there. The pre-initialized depth buffer prevents overdraw. Also, be sure to set the depth test to glDepthFunc(GL_LEQUAL) to avoid z-fighting between subsequent passes.

With these settings, make one pass for each light source. Each pass clears the stencil buffer, marks shadowed areas in it, and then computes the illumination in nonshadowed areas and adds them to the frame buffer.

The overall structure of the rendering part of the program is shown in Figure 9-4.

fig09-04.jpg

Figure 9-4 Program Structure Diagram

A simplified version of this procedure appears in Listing 9-1. The simplification is that the "mark shadows" step is reduced to the worst case, in which every one of the conditionals in the diagram returns true. After walking through the code in detail, we'll put the shorter paths back in as optimizations. The sections of code that will be changed by these optimizations are highlighted to make them easy to find later.

Example 9-1. Program Structure Pseudocode

static const float black[] = {0.0f, 0.0f, 0.0f, 0.0f};
 
glPushAttrib(GL_ALL_ATTRIB_BITS);
setupCamera();
 
// -- Ambient + emissive pass --
 
   // Clear depth and color buffers
glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT);
 
glBlendFunc(GL_ONE, GL_ZERO); glEnable(GL_BLEND_FUNC);
glDepthMask(0xFF); glDepthFunc(GL_LEQUAL);
glEnable(GL_LIGHTING); glDisable(GL_LIGHT0);
glLightModelfv(LIGHT_MODEL_AMBIENT, globalAmbient);
drawScene();
 
// Light passes
glLightModelfv(LIGHT_MODEL_AMBIENT, black);
glEnable(GL_LIGHT0); glBlend(GL_ONE, GL_ZERO);
glDepthMask(0x00); glEnable(GL_LIGHT0); glEnable(GL_STENCIL_TEST);
glEnable(GL_STENCIL_TEST_TWO_SIDE_EXT);
 
for (int i = numLights - 1; i >= 0; --i) {
// (The "XY" clipping optimizations set the scissor
 
   // region here.)
 
  
   //-- Mark shadows from all casters --
  
   // Clear stencil buffer and switch to stencil-only rendering
  glClear(GL_STENCIL_BUFFER_BIT); glColorMask(0, 0, 0, 0);
  glDisable(GL_LIGHTING); glStencilFunc(GL_ALWAYS, 0, ~0);
  glStencilMask(~0);
 
  loadVertexShader();
 
  for (int c = 0; c < numCasters; ++c) {
// (The "point and spot" optimization marks shadows
 
   // only for casters inside the light's range)
 
       setVertexParam("L", object->cframe.inv() * light[i]);
 
       object[c]->markShadows(light[i].direction);
  }
  unloadVertexShader();
 
  //-- Add illumination -
  
   // Configure lighting
  configureLight(light[i]);
 
  glEnable(GL_LIGHTING); glStencilFunc(GL_EQUAL, 0, ~0);
  glActiveStencilFaceEXT(GL_FRONT);
  glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
  glActiveStencilFaceEXT(GL_BACK);
  glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
  glDepthFunc(GL_EQUAL); glColorMask(1, 1, 1, 1);
  glCullFace(GL_BACK);
 
// (The "point and spot" optimization adds illumination              
 
   // only for objects inside the light's range)
  drawScene();
}
 
glPopAttrib();

9.2.2 Vertex Buffer Structure

The shadow of a mesh is cast by its silhouette. To quickly find the silhouette edges and extrude them into a shadow volume, meshes need more information than what's needed in a traditional rendering framework that uses only face triangles.

In our system, the vertex buffer for a mesh contains two copies of each vertex. Say there are n original vertices. Elements 0 through n - 1 of the vertex buffer contain typical vertices, of the form (x, y, z, 1). Elements n through 2n - 1 are copies of the first set but have the form (x, y, z, 0). The first set can be used for normal rendering. Both sets will be used for shadow determination, where a vertex shader will transform the second set to infinity.

Objects also must have adjacency information and per-face normals. For every face, we need to know the three counterclockwise vertex indices and the surface normal. For every edge, we need the indices of the two adjacent faces and the indices of the two vertices. As mentioned previously, the model must be closed so it has no cracks. In terms of adjacency information, this means that every edge has exactly two adjacent faces that contain the same vertices but in opposite order. By convention, let the first face index of an edge be the one in which the edge vertices are traversed in order, and let the second index be the face in which the vertices are traversed in the opposite order. Note that there may be vertices in the model that are not in any edge or face. This is because it is a common practice when creating 3D models to collocate vertices with different texture coordinates. For adjacency information, we care only about the raw geometry and ignore the texture coordinates, normals, vertex colors, and so on that are stored with a model for rendering purposes.

9.2.3 Working at Infinity

Unlike other OpenGL programs you may have written, shadow volumes make extensive use of coordinates at infinity. Shadow volumes themselves consist of both finite geometry and geometry at infinity. The algorithm is implemented for point light sources, and directional lights are handled as point lights at infinity. The far clipping plane must be at infinity so that it will not cut off the infinite shadow volumes, and the perspective projection must be configured to take this into account.

OpenGL provides full support for working at infinity using homogeneous coordinates. This section reviews homogeneous vertices (for geometry and light sources) and shows how to configure an infinite-perspective matrix.

Finite homogeneous points are represented as (x, y, z, 1); that is, the w component is equal to 1. This implicitly means the point at 3D position (x/1, y/1, z/1). Perspective projection matrices use the w component to divide through by a nonunit value, creating vertices such as (x, y, z, -z) that become (x/-z, y/-z, -1) after the homogeneous divide. What happens when the w component is zero? We get a point that has the form (x/0, y/0, z/0). This point is "at infinity." Of course, if we actually divided each component by 0 and computed the point, it would become (U221E.GIF, U221E.GIF, U221E.GIF), which throws away important information—the direction in which the point went to infinity. The (x, y, z, 0) representation uses w = 0 to flag the point as "at infinity" but retains the directional information (x, y, z).

Intuitively, a point at infinity acts as if it is very far away, regardless of the physical dimensions of the scene. Like stars in the night sky, points at infinity stay fixed as the viewer's position changes, but they rotate according to the viewer's orientation. OpenGL renders points with w = 0 correctly. Again like stars, they appear as if rendered on a sphere "at infinity" centered on the viewer. Note that for a point at infinity, only the direction (x, y, z) is important, not the magnitude of the individual components. It is not surprising that OpenGL therefore uses w = 0 to represent a directional light as a point light whose position is the vector to the light: a directional light is a point light that has been moved to infinity along a specific direction.

Throughout this chapter, we use w = 0 to represent points at infinity. We'll not only use point lights at infinity, but also extrude shadow volumes to infinity. In the previous section, we used w = 0 as a notation in the second half of the vertex buffer. This was because those vertices will be moved to infinity (they are the infinite end of the shadow volume). The vertex shader will move them relative to the light before they are actually transformed to infinity, however.

When rendering all of these objects at infinity, we can't have them clipped by the far plane. Therefore, we need to move the far clipping plane to infinity. This is done by computing the limit of the standard projection matrix as the far plane moves to infinity:

ch09_eqn001.jpg

In code, this is a minor change to the way we compute the perspective projection matrix. Just create the projection matrix as shown in Listing 9-2 instead of using glFrustum.

Example 9-2. An Infinite Projection Matrix in the Style of glFrustum

void perspectiveProjectionMatrix(double left,
                                 double right,
                                 double bottom,
                                 double top,
                                 double nearval,
                                 double farval)
{
  double x, y, a, b, c, d;
  x = (2.0 * nearval) / (right - left);
  y = (2.0 * nearval) / (top - bottom);
  a = (right + left) / (right - left);
  b = (top + bottom) / (top - bottom);
 
  if ((float)farval >= (float)inf) {
    // Infinite view frustum
    c = -1.0;
    d = -2.0 * nearval;
  } else {
      c = -(farval + nearval) / (farval - nearval);
      d = -(2.0 * farval * nearval) / (farval - nearval);
    }
 
  double m[] = {x, 0, 0, 0,
                0, y, 0, 0,
                a, b, c, -1,
                0, 0, d, 0};
 
  glLoadMatrixd(m);
}

The Cg vertex shader from Listing 9-3 transforms points with w = 1 normally and sends points with w = 0 to infinity away from the light.

Example 9-3. A Vertex Shader for Extruding w = 0 Vertices Away from the Light

VOut main(const float4x4 uniform in MVP,
          const float4 uniform in L,
          const VIn in vin)
{
  VOut vout;
// (The "directional" optimization eliminates the vertex shader
 
   // by using different rendering loops for point and directional
 
   // lights.)
  vout.pos = MVP * (vin.pos.w == 0 ?
                    float4(vin.pos.xyz * L.w - L.xyz, 0) :
                    vin.posvin.pos);
 
  return vout;
}

The branch operator (?) can be replaced with a call to the lerp function on older graphics cards that don't support branching in vertex shaders. Note that multiplying the point position by L.w in the middle line makes the point's position irrelevant for a directional light. This is because the vector from the light to a point is independent of the point position for a directional light. In Listing 9-1, the call to setVertexParam sets the object-space light vector. The implementations of loadVertexProgram, unloadVertexProgram, and setVertexParam depend on the vertex shader runtime used.

9.3 Detailed Discussion

The goal of markShadows is to set the stencil buffer to zero for illuminated pixels and to a nonzero number for shadowed pixels. It does this by constructing a shadow volume—the geometry that bounds the shadow regions—and rendering it into the stencil buffer. Here we briefly look at the mathematical justification for this, and then we cover the implementation in detail.

9.3.1 The Math

Figure 9-5 shows a simple scene with a single point light (the light bulb icon), a shadow-casting box, a shadow-receiving ground plane, and a viewer on the left. The line in front of the viewer represents the image plane, which is important to the discussion in Section 9.5. The blue arrows represent light rays from the source (for clarity, only a few are shown). The ground plane is bright where the leftmost and rightmost rays strike it. The center rays hit the shadow caster instead and are blocked. The ground plane is dark (shadowed) underneath the caster where these rays are blocked. Somewhere between the outer and the inner rays in the diagram are critical lines, shown dashed. These lines mark the edges of the shadow. Note that they pass through the center of the light and the edges of the shadow caster. The diagram is 2D; in 3D, these are not lines but quadrilaterals. These lines are the sides of the shadow volume. Everything farther than the shadow caster and between them is shadowed. All other points are illuminated.

fig09-05.jpg

Figure 9-5 A Simple Scene

Figure 9-6 shows the shadow volume explicitly. The shadow volume has three pieces.

fig09-06.jpg

Figure 9-6 Shadow Volume for the Simple Scene

Although the figures show a 2D diagram of a simple scene, keep in mind that the shadow volumes are in 3D and may have complicated geometry if the shadow caster has a complicated shape. For comparison, the geometry of real 3D shadow volumes is shown in Figures 9-2, 9-7, and 9-10. If there are multiple shadow casters (and there usually are), the shadow volume will have many separate parts. These parts might even overlap. None of this is a problem; the algorithm handles a triangle or a complete scene equally well without any special work on our part.

fig09-07.jpg

Figure 9-7 A Shadowed Character from

Here's a mathematical strategy for performing shadow determination using the shadow volume. When rendering, each pixel corresponds to a point in 3D space. We want to set the stencil buffer to a nonzero value (shadowed) at that pixel if the point is inside the shadow volume; otherwise, we'll set it to zero (illuminated). Call the point in question P. Consider intersections between the ray that starts at P and travels to infinity along the negative view vector, -V, and the shadow volume. There are two kinds of intersections. An entering intersection occurs when the ray moves from outside the shadow volume to inside. Let M be the surface normal to the shadow face intersected. At an entering intersection, M · V > 0. An exiting intersection occurs when the ray leaves a shadow volume and has M · V < 0 (ignore glancing intersections where M · V = 0). The key idea is to count the number of occurrences of each kind of intersection:

Point P is in shadow if and only if there were more entering intersections than exiting intersections along a ray to infinity.

Rays that travel along the negative view vector lie within exactly one pixel under perspective projection. We exploit this fact to perform the intersection counts in hardware using the stencil buffer, which makes the method fast.

9.3.2 The Code

Here's how to implement our observations efficiently in hardware. Initialize the stencil buffer to zero and enable wrapping increment and decrement operations, if supported on the graphics card. (If wrapping is not supported, initialize all stencil values to 128 or some other value to avoid underflow.) Disable color rendering and render the shadow volume geometry to the stencil buffer. Because we're counting intersections with the ray that starts at each visible point and travels away from the viewer, set up the hardware to change the stencil value when the depth test fails. The stencil buffer is decremented for each front-face pixel that fails the depth test and incremented for each back-face pixel that fails the depth test.

Note that we disabled color rendering immediately before rendering shadow volumes, and we disabled depth writing a while ago, after the ambient illumination pass. Because both color and depth writing are disabled, rendering shadow volumes affects only the stencil buffer. Color writing must be disabled because we don't want to see the shadow volumes in the final image, just the shadows (which are based on the stencil counts). Depth writing needs to be disabled because we assumed that the depth values in the z-buffer represent the depths of visible surfaces (and not shadow volumes). Because depth writing is disabled, shadow faces do not interact with each other, and so the order in which they are rendered does not matter.

After rendering, the stencil value at a pixel will be zero if the same number of front and back faces were rendered, and the value will be nonzero if the counts differ. Entering intersections are always created by front faces, and exiting intersections are always created by back faces. The stencil count after rendering is therefore the number of entering intersections minus the number of exiting intersections—precisely the result we want for shadow determination.

9.3.3 The markShadows Method

The code for the markShadows method on the Object class is shown in Listing 9-4.

First, we take the light vector from world space to object space. For a point light or spotlight, this vector is the position (x, y, z, 1). For a directional light, it has the form (x, y, z, 0), where (x, y, z) is the vector to the light source. In general, a homogeneous vector with w = 0 can be thought of as a point on a sphere at infinity. A directional light is therefore the same as a point light at infinity.

Example 9-4. The markShadows Method

// isBackface[f] = true if face f faces away from the light
std::vector<bool> backface;
 
void Object::markShadows(const Vector4& wsL)
{
 
// (When the viewport is not shadowed by this object, this
 
// section is changed by the "uncapping" optimization.)
 
// Decrement on front faces; increment on back faces
 
// (a.k.a. z-fail rendering)
 
     glActiveStencilFaceEXT(GL_FRONT);
 
     glStencilOp(GL_KEEP, GL_DECR_WRAP_EXT, GL_KEEP);
 
     glActiveStencilFaceEXT(GL_BACK);
 
     glStencilOp(GL_KEEP, GL_INCR_WRAP_EXT, GL_KEEP);
  glCullFace(GL_NONE);
 
// (The "Z bounds" optimization sets the depth bounds here.)
 
   // Take light to object space and compute light back faces
  obj->findBackfaces(cframe.inv() * wsL);
 
  // Set up for vertex buffer rendering
  glVertexBuffer(vertexBuffer);
 
  renderShadowCaps();
 
  renderShadowSides();
 
  glVertexBuffer(NULL);
}

With this object-space light vector, we compute the light front faces and light back faces. The facing directions are needed only temporarily and are stored in a global array. The (double-length) vertex buffer is then selected, and we render the shadow light and dark caps as triangles. Finally, the sides of the shadow volume are rendered as quads.

9.3.4 The findBackfaces Method

The findBackfaces method iterates over each face and computes N · L, as shown in Listing 9-5.

Example 9-5. The findBackfaces Method

void Object::findBackfaces(const Vector4& osL) // Object-space light
                                               
   // vector
{
  backface.resize(face.size());
  for (int f = 0; f < face.size(); ++f) {
    Vector3 L = L.xyz() - vertex[face[f].vertex[0]] * L.w;
    backface[f] = dot(face[f].normal, L) < 0;
  }
}

For a finite point light, the vector to the specific polygon is needed, so we subtract the position of one face vertex from the light position. For directional lights, the light direction is used unchanged. For performance, these cases can be handled in separate loops; they are combined in this example only for brevity. Note that none of the vectors needs to have unit length, because we're interested in only the sign of N · L, not the magnitude.

If the model is animated, the face normals must be recomputed from the animated vertices for every frame. This precludes the use of matrix skinning or vertex blending in hardware, because the modified geometry would then not be available on the CPU. At the end of this chapter, we discuss some proposed techniques for combining shadow volumes with hardware vertex displacement.

9.3.5 Light and Dark Caps

Given the back face array, we can compute the caps and shadow volume sides. In each case, we will accumulate a list of vertex indices and then render the indices from the vertex buffer with glDrawElements. The indices are temporarily stored in another global array, called index.

The code for the light and dark caps is shown in Listing 9-6.

Example 9-6. The renderShadowCaps Method

// Indices into vertex buffer
std::vector<unsigned int>  index;
 
void Object::renderShadowCaps()
{
// (The "Culling" optimization changes this method
 
// to try to cull the light and dark caps separately.)
  index.resize(0);
  for (int f = face.size() - 1; f >= 0; --f) {
    if (backface[f]) {
      // Dark cap (same vertices but at infinity)
      
   for (int v = 0; v < 3; ++v) {
        index.pushBack(face[f].vertex[v] + n);
      }
    } else {
        // Light cap
        
   for (int v = 0; v < 3; ++v) {
          index.pushBack(face[f].vertex[v]);
        }
      }
  }
 
  glDrawElements(GL_TRIANGLES, index.size(),
                 GL_UNSIGNED_INT, index.begin());
 
}

Light caps are simply polygons that face the light. To create dark caps, we take the light back faces and send them away from the light, to infinity. To do this, we render from the second set of vertices, which the vertex shader sends to infinity for us.

Figure 9-7 shows an animated Quake 3 character standing on white ground lit by a white point light. The shadow volumes of the character are shown in yellow on the right side of the figure. Note that the shape of the dark cap, which is the part of the shadow volume far from the character, is the same as that of the character, but it is enlarged. The light cap is not visible because it is inside the character. The polygons stretching between the light and dark caps are the sides, which are constructed from silhouette edges.

9.3.6 Sides

The sides of the shadow volume are quadrilaterals between the first and second sets of vertices—that is, between the object and infinity. We iterate over the edges of the mesh. Recall that only those edges on the silhouette need be extruded into quads; the other edges do not affect the shadow volume.

A silhouette edge occurs where an object's light back face meets one of its light front faces. All of the information to make such a classification is available to us. The edges store the indices of the two adjacent faces, and the back-face array tells us which face indices correspond to light back faces. See Listing 9-7.

It is important to construct edge information for the mesh with consistent edge orientations, so that the resulting shadow-face quads have correct winding directions. On the shadow faces, the vertices must wind counterclockwise, so that the surface normal points out of the shadow volume. To ensure this, we use a convention in which the directed edge from vertex v0 = edge[e].vertex[0] to vertex v1 = edge[e].vertex[1] is counterclockwise in the mesh face with index edge[e].face[0] and clockwise (backward) in the mesh face with index edge[e].face[1].

The shadow quad must contain the edge directed in the same way as the back face. Therefore, if face edge[e].face[0] is a back face, the shadow face contains the edge from v0 to v1. Otherwise, it contains the edge from v1 to v0. Figure 9-8 shows the winding direction for the light front face and the shadow quad at an edge directed from v0 to v1.

fig09-08.jpg

Figure 9-8 Winding Direction

Example 9-7. The renderShadowSides Method

void Object::renderShadowSides()
{
  index.resize(0);
 
  for (int e = edges.size() - 1; e >= 0; --e) {
    if (backface[edge[e].face[0]] != backface[edge[e].face[1]) {
      // This is a silhouette edge
      
   int v0, v1;
      if (backface[edge[e].face[0])) {
        // Wind the same way as face 0
        v0 = edge[e].vertex[0];
        v1 = edge[e].vertex[1];
      } else {
          // Wind the same way as face 1
          v1 = edge[e].vertex[0];
          v0 = edge[e].vertex[1];
        }
 
// (The "directional" optimization changes this code.)
      index.pushBack(v0);
      index.pushBack(v1);
      index.pushBack(v1 + n);
      index.pushBack(v0 + n);
    }
  }
 
// (The "directional" optimization changes this to use
 
   // GL_TRIANGLES instead of GL_QUADS.)
  glDrawElements(GL_QUADS, index.size(),
                 GL_UNSIGNED_INT, index.begin());
}

We've now walked through the entire shadow-rendering procedure. We've built a system that classifies pixels as shadowed or unshadowed in the stencil buffer and then adds illumination to only the unshadowed pixels. This system can handle many different kinds of light sources and complex shadow-caster geometry. It can also interoperate with other shadow algorithms such as projective shadows and shadow maps. The program can be altered to add illumination only to those areas that pass all the shadow tests.

By taking advantage of some common cases where the shadow volume algorithm is simplified, we can significantly speed up the process. The remainder of this chapter describes ways of speeding up shadow volume creation and rendering. In practice, the following methods can quadruple the speed of the base algorithm.

9.4 Debugging

To see if you are generating the shadow volumes correctly, temporarily enable color rendering and then draw shadow volumes with additive alpha blending. Turn on face culling and use one color for front faces and another for back faces. These shapes have other uses beyond debugging: you might want to render visible shadow volumes during gameplay for effects such as light rays passing through clouds or trees.

Remember that OpenGL requires the stencil test to be enabled, even if it is set to GL_ALWAYS_PASS, when using a stencil operation. Also, don't forget the stencil mask: glStencilMask(~0). If you forget either of these prerequisites, your write operations will be ignored.

Use assertions to check that every edge has exactly two adjacent faces. If you have cracks in a model, you'll get shadow streaks in the air like those we saw in Figure 9-3. Software modelers such as 3ds max have tools to fix cracks (called welding vertices) automatically—use them!

9.5 Geometry Optimizations

For clarity and simplicity, the base shadow-volume algorithm was described in the first half of the chapter in generic form, with directional lights, point lights, and spotlights treated the same. We used the mathematical trick L.xyz() - V * L.w in Listing 9-5 and a similar one in the vertex shader in Listing 9-3. These listings compute the light vector for both types of light with a single expression. We can improve performance by treating them separately in the vertex shader and throughout the process of generating shadow volumes. The shadow volume created by a directional light is simpler than that created by a point light, so this can turn into a big savings (at the expense of code complexity).

We can also improve geometry processing performance by using conservative bounding volumes to cull shadow geometry. This section describes these optimizations.

9.5.1 Directional Lights

For a directional light, the light vector is just L.xyz. Because the light vector is the same at all vertices, all vertices in the dark cap are at the same point, which is -L. This means there is no dark cap: the (parallel) sides of the shadow volume converge at infinity to a single point, and so the cap is unnecessary.

Because they converge to a point, the sides are triangles, not quads. The push statements in renderShadowSides (Listing 9-7) become:

index.pushBack(v0);
index.pushBack(v1);
index.pushBack(n);

These statements not only have fewer indices, but they are more friendly to the vertex cache. That's because the same vertex number n is transferred multiple times (we could transfer any one vertex with index greater than or equal to n, because they all transform to the same point). Alternatively, we could eliminate the vertex shader altogether and add one more vertex with index 2n that is set to the negative light vector before each shadow pass.

9.5.2 Point Lights and Spotlights

Point lights are typically attenuated by distance. After a certain distance, the light given off by a point light is negligible (when it drops below 1/255, we can't even see the result in an eight-bit frame buffer). Spotlights are attenuated by angle, and sometimes by distance. Outside the cone of the spotlight, they give no illumination.

If an object is outside the effective range of either kind of light source, it does not need to cast a shadow, because any object behind it is also outside the range. Detect this case by testing the bounding box of a shadow caster against the bounding sphere of a distance-attenuated light, or against the cone of an angularly attenuated light. When an object is outside the range, don't mark shadows for it. Likewise, no illumination pass is needed for objects outside the light's range.

9.5.3 Culling Shadow Volumes

Just as with regular geometry, the vertex-processing rate may be improved for shadow volumes by culling shadow geometry outside the view frustum. Note that the caster may be outside the view frustum and still cast a shadow on visible objects, so culling the caster and the shadow geometry are completely independent.

Cull the sides and cap separately. For each, approximate the shadow geometry with a geometric primitive and cull that primitive against the view frustum. The light cap can use the same bounds as the caster geometry, because the cap is inside the caster. The dark cap uses the same geometry, but sent to infinity away from the light source.

For example, say a bounding box is available for the caster. Transform each vertex, v, of the bounding box to infinity using the equation v' = MV * (v * L w - L xyz where L is the object-space light vector and MV is the modelview matrix. Then test the transformed bounding box against the view frustum. If the box is culled, the dark cap can also be culled. The shadow volume sides are most easily bounded by a cylinder for directional lights and by a truncated cone for point lights.

Although any culling is helpful, culling the caps particularly speeds up vertex processing because caps have many more vertices than sides. For point lights, the dark cap is potentially huge; culling it can also save a lot of fill rate. This is the effect we see in cartoons when a kitten casts a lion's shadow by standing in front of a flashlight. This magnifying effect was illustrated in Figure 9-7, where the dark cap for the model is several times larger than the model itself.

9.5.4 Uncapped Performance

Even when the caps would otherwise be unculled, we can use another technique to remove the caps for a special case in which the viewport is unshadowed.

In the mathematical formulation, we used rays from a point to infinity away from the viewer. In the implementation, these rays were simulated by rendering polygons to the stencil buffer. We moved the far clipping plane to infinity and sent rays away from the viewer so that we wouldn't miss any intersections between the point and infinity because of clipping.

It's possible to count in the other direction. To count away from the viewer, increment or decrement the stencil buffer when the depth test fails. To count toward the viewer, increment or decrement when the depth test passes. When the viewport is not in a shadow volume, the number of intersections along a line segment from an unshadowed point to the image plane is zero. This is because the line had to pass through exactly the same number of entering and exiting intersections to get from an unshadowed point to an unshadowed viewport. If the point is shadowed, the number of intersections will be nonzero. Of course, we can count in this direction only if the viewport is not in a shadow itself; otherwise, the count will be off by the number of shadow volumes enclosing each viewport pixel. Figure 9-7 showed a case where this optimization can be used because the shadows, which stretch back into the scene, do not enclose the viewport. Figure 9-2 showed an example where it cannot be used, because the viewport is in the shadow cast by the pumpkin—in fact, everything in the scene is in shadow, except the triangles of ground plane, where light shines out of the eyes.

The advantage of counting toward the viewer is that we don't need to render the light and dark caps. The light cap will always fail the depth test, because it is inside the shadow caster, so there is no reason to render it. Because we're counting from visible points to the viewer, there is no way for the dark cap (which is at infinity) to create intersections, and so we don't need to render it, because it can't change the result.

This optimization requires two changes to the code:

  1. We need to test whether the viewport is (conservatively) in a shadow volume. This test is performed separately for each shadow caster; we can choose our counting direction independently for each caster and still get a correct result.
  2. If the viewport is not in a shadow volume, we need to reverse the increment/decrement sense of the stencil operations (for that caster only).

Figure 9-9 shows the occlusion pyramid of the viewport. The tip is at the light source (which is at infinity if it is a directional light), and the base is the viewport. If the bounding box of the shadow caster intersects this pyramid, the viewport may be shadowed and the optimization cannot be used. In that case, we must render with the normal depth-fail operations and draw both caps, if visible. If the bounding box does not intersect the pyramid, we can change the stencil operations.

fig09-09.jpg

Figure 9-9 The Occlusion Pyramid

The occlusion pyramid can be on either side of the viewport. If the shadow caster intersects the green pyramid, the "uncapped" optimization cannot be used.

For counting toward the viewer, set the stencil operations as follows:

// Increment on front faces, decrement
 
   // on back faces (a.k.a. z-pass rendering)
glActiveStencilFaceEXT(GL_FRONT);
glStencilOp(GL_KEEP, GL_KEEP, GL_INCR_WRAP_EXT);
glActiveStencilFaceEXT(GL_BACK);
glStencilOp(GL_KEEP, GL_KEEP, GL_DECR_WRAP_EXT);

Because this is "uncapped" rendering, omit the code to render shadow volume caps entirely from this case.

9.6 Fill-Rate Optimizations

Fill rate is the Achilles heel of shadow volumes. Shadow volumes cover many pixels and have a lot of overdraw. This is particularly troublesome for point lights, which create shadows that get bigger the farther they are from the caster. Fortunately, point lights also have great optimization potential, because their attenuation creates a range beyond which illumination is practically zero. We've already discussed not marking shadows for casters outside this range and not rendering illumination on objects outside the range. Now we'll look at three ways to reduce the fill rate required for casters inside the range: finite volumes, XY clipping, and z-bounds.

9.6.1 Finite Volumes

The range of a point light forms a sphere. Objects outside this sphere don't receive illumination, so there is no need to cast shadows beyond the sphere. Instead of extruding shadow volumes to infinity, we can extend them by the radius of the sphere and save the fill rate of rendering infinite polygons. This is a straightforward change to the vertex shader that can recoup significant fill rate. Because the dark cap is more likely to be on-screen under this method, it may increase the geometry processing because the dark cap is less likely to be culled.

An alternative is to still create polygons that stretch to infinity, but clip them to the light radius in 2D, as described in the next optimization.

9.6.2 XY Clipping

The range sphere projects to an ellipse on screen. Only pixels within that ellipse can be illuminated. We don't need to render shadow polygons or illumination outside of this ellipse. However, hardware supports a rectangular clipping region, not an elliptical one. We could compute the bounding box of the projected ellipse, but it is more convenient to use the 2D bounding box of the projected 3D bounding box surrounding the light range. Although the fixed-function pipeline supports only radial attenuation, artists can achieve more controlled effects by specifying an arbitrary attenuation function over the cubic volume about a light, as done in Doom 3. Attenuation can fall off arbitrarily within a box, so we just use that box as the light range. Clip the light's box to the view frustum. If it is not entirely clipped, project all vertices of the remaining polyhedron onto the viewport and bound them. That final 2D bound is used as the rectangular clipping region. Set the clipping region with the glScissor command:

glScissor(left, top, width, height);
glEnable(GL_SCISSOR_TEST);

Figure 9-10 shows a Quake 3 character standing outside a building. This is the scene from Figure 9-7, but now the camera has moved backward. The single point light creates shadow volumes from the character and the building (shown in yellow), which would fill the screen were they not clipped to the effective bounds of the light. The scissor region is shown in the right half of the figure as a white box. The left half of the figure shows the visible scene, where the effect of clipping is not apparent because the light does not illuminate the distant parts of the building. For this scene, rendering only the shadow volume pixels within the scissor region cuts the fill-rate cost in half.

fig09-10.jpg

Figure 9-10 Clipping in

9.6.3 Z-Bounds

If the point at a given pixel is outside of the light range—because it is either closer to the viewer or farther from the viewer than the range bounds—that point cannot be illuminated, so we don't need to make a shadow-marking or illumination pass over that pixel. Restricting those passes to a specific depth range means that we pay fill rate for only those pixels actually affected by the light, which is potentially fewer pixels than those within the 2D bounds of the light.

The glDepthBoundsEXT function lets us set this behavior:

glEnable(GL_DEPTH_BOUNDS_TEST_EXT);
glDepthBoundsEXT(zMin, zMax);

This setting prevents rendering a pixel where the depth buffer already has a value outside the range [zMin, zMax]—that is, where the point visible at that pixel (rendered in the ambient pass) is outside the range. This is not the same as a clipping plane, which prevents rendering new pixels from polygons past a bound.

Figure 9-11 shows a viewer looking at an object illuminated from a point light. The caster's shadow projects downward toward the rugged ground slope. The bold green portion of the ground can't possibly be shadowed by the caster. The depth-bounds test saves the fill rate of rendering the orange parts of the shadow volume because the visible pixels behind them (the bold green ones) are outside the bounds. Notice that the shadow volume itself is inside the bounds, but this is irrelevant—the depth bound applies to the pixel rendered in the ambient pass, not to the shadow geometry.

fig09-11.jpg

Figure 9-11 Depth Bounds

Note that the depth bounds are more restrictive than just the light range. It is the depth range defined by the intersection of the view frustum, the light range, and the shadow volume bounds. The arguments to the OpenGL function are post-projective camera-space values. If the geometry of the intersection is defined by a polyhedron whose vertices are stored in an array std::vector<Vector4> boundVert, the arguments are computed as:

float zMin = 1.0f;
float zMax = 0.0f;
for (int v = boundVertex.size() - 1; v >= 0; --v) {
  float z = 1.0f / (projectionMatrix * boundVert[v]).w;
 
  zMin = min(zMin, z);
  zMax = max(zMax, z);
}

9.7 Future Shadows

The current, highly optimized shadow volume method is the result of contributions from industry and academia over the past several decades. The basic method was introduced by Frank Crow at SIGGRAPH 1977 and has matured into the method described in this chapter. The history of shadow volumes and the individual contributions of several researchers and developers are summarized in technical reports available on the NVIDIA Developer Web site (Everitt and Kilgard 2002, McGuire et al. 2003). McGuire et al. 2003 gives a formal description and analysis of the method presented in this chapter.

Improving the performance of shadow volume generation through new optimizations continues to be an active research area. Silhouette determination has always been performed on the CPU, which is a major limitation. It precludes the use of matrix skinning or other deformations in the vertex shader and otherwise serializes rendering on CPU operations.

Several solutions have been proposed for performing silhouette determination directly on programmable graphics hardware. Michael McCool (2001) proposed a method for computing the caster silhouettes from a shadow map. Brabec and Seidel (2003) push geometry encoded as colors through the pixel processor, where they compute silhouettes. They then read back the frame buffer and use it as a vertex buffer for shadow rendering. John Hughes and I recently described how to find silhouettes and extrude them into shadow volume sides entirely in a vertex shader using a specially precomputed mesh (McGuire and Hughes 2003).

Getting good-looking, high-performance soft shadows from area light sources with shadow volumes is another open research topic. Ulf Assarsson and Tomas Akenine-Möller have worked on this problem for some time. Their most recent paper, with Michael Dougherty and Michael Mounier (Assarsson et al. 2003), describes how to construct explicit geometry for the interior and exterior edges of the penumbra (the soft-shadow region) and makes heavy use of programmable hardware.

Several people have proposed joining the individual silhouette edges into connected strips so that quad strips (for point lights) and triangle fans (for directional lights) can be used to render the shadow volume sides. Alex Vlachos and Drew Card (2002) have been working on another simplification idea: culling and clipping nested shadow volumes, because they won't affect the final result.

All of these methods are experimental and have yet to be refined and proven in an actual game engine. If you are interested in moving beyond the capabilities of the current shadow volume method, these are good starting points. Hopefully, future research and graphics hardware will improve and accelerate these methods.

9.8 References

Assarsson, U., M. Dougherty, M. Mounier, and T. Akenine-Möller. 2003. "An Optimized Soft Shadow Volume Algorithm with Real-Time Performance." In Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003.

Brabec, S., and H. Seidel. 2003. "Shadow Volumes on Programmable Graphics Hardware." Eurographics 2003 (Computer Graphics Forum).

Everitt, Cass, and Mark Kilgard. 2002. "Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering." NVIDIA Corporation. Available online at http://developer.nvidia.com/object/robust_shadow_volumes.html

McCool, Michael. 2001. "Shadow Volume Reconstruction from Depth Maps." ACM Transactions on Graphics, January 2001, pp. 1–25.

McGuire, Morgan, and John F. Hughes. 2003. "NPR on Programmable Hardware." To appear in Proceedings of NPAR 2004, June 7–9, Annecy, France.

McGuire, Morgan, John F. Hughes, Kevin Egan, Mark Kilgard, and Cass Everitt. 2003. "Fast, Practical and Robust Shadows." Available online at http://developer.nvidia.com/object/fast_shadow_volumes.html . An early version appeared as Brown Univ. Tech. Report CS03-19.

Vlachos, Alex, and Drew Card. 2002. "Computing Optimized Shadow Volumes." In Game Programming Gems 3, edited by Dante Treglia. Charles River Media.

Tekkaman Blade robot model by Michael Mellor (mellor@iaccess.com.au); Tick model by Carl Schell (carl@cschell.com). Both available for download at http://www.polycount.com. Cathedral model by Sam Howell (sam@themightyradish.com), courtesy Sam Howell and Morgan McGuire. "The Tick" character is a trademark of New England Comics. Quake 2, Quake 3, and Doom 3 are trademarks of id Software.

GPU Gems: Chapter 10. Cinematic Lighting

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Chapter 10. Cinematic Lighting

Fabio Pellacini
Pixar Animation Studios

Kiril Vidimce
Pixar Animation Studios

In this chapter, we present a simplified implementation of uberlight, a light shader that expresses the lighting model described by Ronen Barzel (1997, 1999). A superset of this model was developed over several years at Pixar Animation Studios and used for the production of animated movies such as the Walt Disney presentations of the Pixar Animation Studios films Toy Story, A Bug's Life, Monsters, Inc., and Finding Nemo.

Our Cg implementation is based on Barzel's original approach and on the RenderMan Shading Language implementation written by Larry Gritz (Barzel 1999). Further details about this lighting approach and its uses in movie production can be found in Apodaca and Gritz 1999 and Birn 2000.

10.1 Introduction

Lighting is an important aspect of computer cinematography, in which lights and shadows are used to convey mood and support storytelling (Calahan 1999). Although realism remains an important aspect of computer-generated imagery, lighting directors constantly cheat the physics of light to support the artistic depiction of animated movies. Performing these tricks on a real-world set is a daunting task that often requires hours of setup.

Freed of the limitations of real physics, developers of computer cinematography have been devising lighting models that let artists illuminate scenes intuitively. The lighting model presented in this chapter is an adaptation of the model developed over the last decade at Pixar Animation Studios and used in the production of most of our movies. See Figures 10-1, 10-2, and 10-3.

fig10-01.jpg

Figure 10-1 Barn Lights in .

fig10-02.jpg

Figure 10-2 Cookies Contribute to a Window Effect in .

fig10-03.jpg

Figure 10-3 Lighting Conveys Mood in .

10.2 A Direct Lighting Illumination Model

The shader we present in this chapter models only the shaping and controls of the light sources that illuminate the scene; it doesn't cover the intricacies of how to model the surface details and the light reflection behavior. (Some examples of interesting surface behaviors can be found in Apodaca and Gritz 1999.)

In general, the illumination model used in our movie production performs two kinds of operations, similar to the pseudocode shown here.

color illuminationModel()
{
  Compute the surface characteristic
  For each light {
    Evaluate the light source
    Compute the surface response
  }
}

First, we compute the surface shading information by performing various texture lookups, interpolating values over the mesh, and computing procedural patterns. Then we loop over each light source that illuminates the object and compute its contribution. We do this by evaluating first the light color and then the surface response to the illumination of each light.

In this chapter, we present a simple shader that computes the contribution of only one light for a plastic reflection model. Extending it to a more general solution for multiple lights and better-looking surfaces is left as an exercise for the reader.

Our lighting model provides artists with control over various aspects of illumination: selection, color, shaping, shadowing, and texturing.

10.2.1 Selection

Each object in the scene can selectively respond to each individual light. This powerful characteristic of our lighting model lets the artist turn off lights on objects when additional light is creating an undesired effect. It also lets artists add extra lights that create a desired effect in a specific location without affecting the rest of the scene.

10.2.2 Color

A light's most noticeable properties are the color and the intensity that describe its emission. Similar to the OpenGL fixed-function lighting, our implementation provides separate weights for the ambient, diffuse, and specular illumination effects that artists can separately tweak. One of the most important aspects of our lighting model is that these terms can be freely changed per-object, letting the artist light entire sets with a small number of lights.

10.2.3 Shaping

To control regions of a scene that are illuminated by a light, real-world cinematographers commonly employ spotlights and rectangular lights (known as barn doors) to shape the light distribution. Our lighting model generalizes on these concepts by providing two types of shaping:

10.2.4 Shadowing

Shadowing is an important aspect of our lighting model; shadows are probably the attributes that artists cheat most often. Shadows are tweaked not only for speed considerations, but also for the ability to control each little aspect of the shadow's look, which is so important in defining the overall mood of a movie. For example, compare the strong shadows in film noir movies with the almost invisible ones in musicals.

As for lighting intensity, artists decide which objects cast shadows and which ones receive them. Also, the lighting designer can cheat shadow positions by moving them in relationship to the light origin. For example, she can allow bright highlights in a character's eyes while making sure that the shadow does not cross the character's face.

Darkness

One of the biggest problems of a direct lighting model is that shadows tend to be too dark. This happens because most of the indirect illumination that naturally occurs in the environment is never computed. To mimic reality in our model, we've created lights that can be adjusted to change the density of shadows, by letting some light propagate through the objects in a scene. In our implementation, we use the diffuse contribution of the surface to color the shadow region. We believe this is better than using an ambient term, because our method lets us maintain those nice gradients that make the shadow believable. Later in the chapter, we elaborate on this topic.

Hue

The hue of a shadowed area in the real world is slightly different from that of a nearby unoccluded region—for example, notice the slight bluish tint of outdoor shadows on a bright, sunny day. To mimic this effect, we allow the artist to change the shadow color. Slight variations of the shadow hue can make the difference between a good-looking shadow and a fake-looking one. In practice, you should think of shadow casters "spraying" receivers with the shadow color, which is commonly black. See Figure 10-6 for images with different shadow colors.

fig10-06a.jpg

Figure 10-6 Variations in Shadow Colors

Reflection

One important caveat concerns highlights. When computing the surface contribution in the shadow area, we use only the diffuse response to obtain those nice gradients seen in outdoor environments, but we don't want to see highlights in the shadow region, because we are cheating diffuse interreflections. To achieve this effect, we simply switch off the specular contribution in the shadow regions. This little adjustment is just one example of how light changes the reflection behavior of surfaces. Tweaks like these are used widely in movie production to achieve that specific look we hope viewers will love.

Shadow Maps

Of the various techniques used to implement shadows, we use shadow maps in our model, for their simplicity and flexibility—and because we use them often in our movies. Although the shader in this chapter is based on such an algorithm, we encourage the reader to experiment with other shadow algorithms. The important aspect of the shader is not how we decide if a pixel is in shadow, it's how we use this information.

Shadow Blurring

The most important aspect missing from our implementation is shadow blurring. Artists often adjust the softness of shadow edges in order to cheat area lights or simply to get the particular look that the director wants. Blurring shadow edges is particularly hard to do efficiently. Various techniques are available, but presenting them is outside the scope of this chapter.

10.2.5 Texturing

Finally, we added projective texture support to allow a wide variety of effects, such as slide projectors and fake shadows from off-screen objects. These tricks are known in the movie production world as cookies; they are also used in game production, but less often. While game developers use texture projection for shape, coloring, and shadowing effects, movie creators tend to use soft cookies to enrich visual details and to add special effects, such as off-camera shadow casters or strangely shaped lights.

10.2.6 Results

Figures 10-7 and 10-8 illustrate the use of the uberlight to illuminate the head of a character from Pixar's short film Geri's Game. The surface of the model is flat and plastic-like, and we don't apply any material-related textures on it. By using a simpler surface model for the object, we can better emphasize the various effects we can obtain by using just the light shader with different parameters. The proper combination of light and surface modeling brings this character to life, as in the original Pixar short.

fig10-07a.jpg

Figure 10-7 Lighting Geri

fig10-08a.jpg

Figure 10-8 Lighting Styles

10.3 The Uberlight Shader

Listings 10-1 and 10-2 show the source code of the uberlight shader, based on the one by Larry Gritz in Barzel 1999.

Example 10-1. The Vertex Program for an Uberlight-Like Shader

void uberlight_vp(
  varying float4 Pobject : POSITION, // Vertex position in object space
  
   varying float3 Nobject : NORMAL,   // Vertex normal in object space
  
   varying float3 VertexColor : COLOR0, // Vertex color
  
   uniform float4x4 ModelViewProj,     // ModelViewProj matrix
  
   uniform float4x4 ObjectToWorld,     // ObjectToWorld matrix
  
   uniform float4x4 ObjectToWorldIT,   // Inverse transpose of the
                                      
   // ObjectToWorld matrix
  
   uniform float4x4 WorldToLight,      // Light space
  
   uniform float4x4 WorldToLightIT,    // Inverse transpose of light
                                      
   // space to transform normals
  
   uniform float4x4 WorldToShadowProj, // Light space concatenated with
                                      
   // the projection matrix used for
                                      
   // the shadow. This defines
                                      
   // shadow space.
  
   uniform float3 CameraPosInWorld,         // Camera position
                                           
   // in world space
  
   uniform float ShadowBias,                // Shadow bias
  
   out float4 HPosition : POSITION,         // Rasterizer position
  
   out float3 CameraPosInLight : TEXCOORD0, // Camera position
                                           
   // in light space
  
   out float3 Plight : TEXCOORD1,           // Interpolated position
                                           
   // in light space
  
   out float3 Nlight : TEXCOORD2,           // Interpolated normal
                                           
   // in light space
  
   out float4 ShadowUV : TEXCOORD3,         // Shadow UV
  
   out float3 Color : COLOR0)               // Surface color
{
  // Compute coordinates for the rasterizer
  HPosition = mul(ModelViewProj, Pobject);
 
 
  // Compute world space pos and normal
  
   float4 Pworld = mul(ObjectToWorld, Pobject);
  float3 Nworld = mul(ObjectToWorldIT, float4(Nobject, 0)).xyz;
 
 
  // Compute the position of the point in light space
  CameraPosInLight = mul(WorldToLight,
                         float4(CameraPosInWorld, 1)).xyz;
  Plight = mul(WorldToLight, Pworld).xyz;
  Nlight = mul(WorldToLightIT, float4(Nworld, 0)).xyz;
 
 
  // Compute the U-V for the shadow and texture projection
  
   float4 shadowProj = mul(WorldToShadowProj, Pworld);
  // Rescale x, y to the range 0..1
  ShadowUV.xy = 0.5 * (shadowProj.xy + shadowProj.ww);
  // When transforming z, remember to apply the bias
  ShadowUV.z = 0.5*(shadowProj.z + shadowProj.w - ShadowBias);
  ShadowUV.w = shadowProj.w;
 
 
  // Pass the color as is
  Color = VertexColor;
}

Example 10-2. The Fragment Program for an Uberlight-Like Shader

// SHADER PARAMETERS ================================================
 
  // Superellipse params
 
  struct SuperellipseShapingParams {
  float width, height;
  float widthEdge, heightEdge;
  float round;
};
 
 
// Distance shaping params
 
  struct DistanceShapingParams {
  float near, far;
  float nearEdge, farEdge;
};
 
 
// Light params
 
   struct LightParams {
  float3 color;   // light color
  
   float3 weights; // light weights (ambient, diffuse, specular)
};
 
 
struct SurfaceParams {
  float3 weights;   // surface weights (ambient, diffuse, specular)
  
   float  roughness; // roughness
};
 
 
// BRDF/LIGHT INTERACTION ===========================================
 
   // Compute the light direction
 
   float3 computeLightDir(float3 Plight)
{
  // Spot only
  
   return -normalize(Plight);
}
 
 
// Ambient contribution of lit
 
   float ambient(float3 litResult)
{
  return litResult.x;
}
// Diffuse contribution of lit
 
   float diffuse(float3 litResult)
{
  return litResult.y;
}
 
 
// Specular contribution of lit
 
   float specular(float3 litResult)
{
  return litResult.z;
}
 
 
// SUPERELLIPSE =====================================================
 
   float computeSuperellipseShaping(
  float3 Plight, // Point in light space
  
   bool barnShaping, // Barn shaping
  SuperellipseShapingParams params) // Superellipse shaping params
{
  if(!barnShaping) {
    return 1;
  } else {
    // Project the point onto the z == 1 plane
    
   float2 Pproj = Plight.xy/Plight.z;
    // Because we want to evaluate the superellipse
    
   // in the first quadrant, for simplicity, get the right values
    
   float a = params.width;
    float A = params.width + params.widthEdge;
    float b = params.height;
    float B = params.height + params.heightEdge;
 
 
    float2 pos = abs(Pproj);
 
 
    // Evaluate the superellipse in the first quadrant
    
   float exp1 = 2.0 / params.round;
    float exp2 = -params.round / 2.0;
    float inner = a * b * pow(pow(b * pos.x, exp1) +
                              pow(a * pos.y, exp1), exp2);
    float outer = A * B * pow(pow(B * pos.x, exp1) +
                              pow(A * pos.y, exp1), exp2);
    return 1 - smoothstep(inner, outer, 1);
    }
}
// DISTANCE SHAPING =================================================
 
   float computeDistanceShaping(
  float3 Plight, // Point in light space
  
   bool barnShaping, // Barn shaping
  DistanceShapingParams params) // Distance shaping params
{
  float depth;
  if(barnShaping) {
    depth = -Plight.z;
  } else {
    depth = length(Plight.z);
    }
 
 
  return smoothstep (params.near - params.nearEdge, params.near, depth) *
                      (1 - smoothstep(params.far, params.far +
                                      params.farEdge, depth));
}
 
 
// MAIN =============================================================
 
   float4 uberlight_fp(
  float3 CameraPosInLight : TEXCOORD0, // Camera position in light space
  
   float3 Plight : TEXCOORD1,   // Interpolated position in light space
  
   float3 Nlight : TEXCOORD2,   // Interpolated normal in light space
  
   float4 ShadowUV : TEXCOORD3, // Shadow UV
 
 
  
   // SURFACE PROPERTIES ------------------------
  
   float3 SurfaceColor : COLOR0,  // Surface color
  
   uniform SurfaceParams Surface, // Other surface params
                                 
   // (weights, roughness)
 
 
  
   // LIGHT PROPERTIES --------------------------
  
   uniform LightParams Light, // Light properties
 
 
  
   // SHAPING -----------------------------------
  
   // Choose between barn shaping (superelliptic pyramid)
  
   // and omni shaping
  
   uniform bool BarnShaping,
  uniform SuperellipseShapingParams SuperellipseShaping,  // Superellipse
                                                          
   // shaping
  
   uniform DistanceShapingParams DistanceShaping,          // Distance
                                                          
   // shaping
  
   // DISTANCE FALLOFF --------------------------
  
   // COOKIES AND SHADOWS -----------------------
  
   uniform sampler2D Shadow,    // Shadow texture
  
   uniform float3 ShadowColor,  // Shadow color
  
   uniform sampler2D Cookie,    // Cookie texture
  
   uniform float CookieDensity) // Cookie density
{
  // TRANSFORM VECTORS TO LIGHT SPACE ---------------------
  
   // Compute the normal in light space (normalize after vertex
  
   // interpolation)
  
   float3 N = normalize(Nlight);
  // Compute the light direction
  
   float3 L = computeLightDir(Plight);
  // Compute the view direction (vector from the point to the eye)
  
   float3 V = normalize(CameraPosInLight - Plight);
  // Compute the half-angle for the specular term
  
   float3 H = normalize(L + V);
 
 
  // COMPUTE THE TEXTURE PROJECTION - COOKIE
  
   float3 cookieColor = tex2Dproj(Cookie, ShadowUV).xyz;
  Light.color = lerp(Light.color, cookieColor, CookieDensity);
 
 
  // COMPUTE THE SHADOW EFFECT ---------------------------
  
   // Get the amount of shadow
  
   float shadow = tex2Dproj(Shadow, ShadowUV).x;
  // Modify the light color so that it blends with the shadow color
  
   // in the shadow areas
  
   float3 mixedLightColor = lerp(ShadowColor, Light.color, shadow);
 
 
  // COMPUTE THE ATTENUATION DUE TO SHAPING --------------
  
   float attenuation = 1;
  // Contribution from the superellipse shaping
  attenuation *= computeSuperellipseShaping(Plight,
                                            BarnShaping,
                                            SuperellipseShaping);
  // Contribution from the distance shaping
  attenuation *= computeDistanceShaping(Plight, BarnShaping,
                                        DistanceShaping);
  // APPLY TO SURFACE ------------------------------------
  
   // Here you should substitute other code for different
  
   // surface reflection models. This code computes the lighting
  
   // for a plastic-like surface.
 
 
  
   // Lighting computation
  
   float3 litResult = lit(dot(N, L), dot(N, H), Surface.roughness).xyz;
 
 
  // Multiply by the surface and light weights
  litResult *= Surface.weights * Light.weights;
 
 
  // Compute the ambient, diffuse, and specular final colors.
  
   // For the ambient term, use the color of the light as is
  
   float3 ambientColor = Light.color * SurfaceColor * ambient(litResult);
  // For the diffuse term, use the color of the light
  
   // mixed with the color in the shadow
  
   float3 diffuseColor = mixedLightColor * SurfaceColor *
                          diffuse(litResult);
  // The specular color is simply the light color times the specular
  
   // term, because we want to obtain white highlights regardless of the
  
   // surface color. Our shadows won't be fully black, so we want to
  
   // make sure that the highlights do not appear in shadow.
  
   float3 specularColor = mixedLightColor * shadow *
                           specular(litResult);
 
 
  // Compute the final diffuse color
  
   float3 color = attenuation * (ambientColor + diffuseColor +
                                specularColor);
 
 
  // Compute the diffuse color
  
   return float4(color, 1);
}

10.4 Performance Concerns

10.4.1 Speed

We can easily speed up the uberlight shader by replacing certain analytic computations with texture lookups. The textures are generated by discretely sampling the functions of computations we want to avoid. As long as the textures have a high-enough resolution (potentially re-creating them based on the scene and camera parameters), the quality of the image can still be very good. Because of production and quality demands in the world of offline rendering, this kind of optimization is rarely performed.

10.4.2 Expense

The most expensive code in this shader is the computation of the light's shape-based attenuation. We can construct a texture map to evaluate the superelliptical shaping for a given set of barn parameters and then use the shadow texture coordinates to look up the barn map contribution in light space as a projective texture. The use of a barn map dramatically reduces the number of shader instructions. Plus, it can more than double the speed of shading (as measured on NVIDIA's Quadro FX 2000 board).

10.4.3 Optimization

When neither the camera nor the objects move in the scene, we can also optimize camera-dependent and scene-dependent shading components of the shader (such as the distance-based shaping). This is a typical usage scenario for a lighting artist who is modifying lighting parameters to light a frame in a given shot. When the artist replaces both the superellipse and the distance-based shaping with two texture lookups, the modified shader performs more than three times faster than the original one.

If you choose this approach to optimize your shaders, consider using a high-level language to create these maps on the fly. In our proprietary multipass interactive renderer, we define the creation of these maps as separate passes. Once created, the pass results are cached and constantly reused. Only when the parameters that affect these maps change are the passes marked as dirty, queued for reevaluation, and once again cached.

10.5 Conclusion

Our lighting model is a simple attempt to provide a comprehensive set of lighting controls that covers most effects used daily by lighting artists. Although our implementation covers a wide variety of effects, many more can be added (and are indeed added daily) to allow the artist more flexibility and expressiveness. Examples of such controls are found in Barzel 1999. Readers may extend our source code examples to cover these and other algorithms.

The lighting controls presented covered only part of the look for which the light source is responsible. When you are developing a full illumination model, be aware that the surface-reflection characteristic of a surface is also important; this property is what distinguishes the appearance of the materials in the scene.

10.6 References

Apodaca, Anthony A., and Larry Gritz, eds. 1999. Advanced RenderMan: Creating CGI for Motion Pictures. Morgan Kaufmann.

Barzel, Ronen. 1997. "Lighting Controls for Computer Cinematography." Journal of Graphics Tools 2(1), pp. 1–20. Available online at http://www.acm.org/jgt/papers/Barzel97

Barzel, Ronen. 1999. "Lighting Controls for Computer Cinematography." In Advanced RenderMan: Creating CGI for Motion Pictures, edited by Anthony A. Apodaca and Larry Gritz. Morgan Kaufmann. Code for the chapter was provided by Larry Gritz.

Birn, Jeremy. 2000. Digital Lighting and Rendering. New Riders Publishing.

Calahan, Sharon. 1999. "Storytelling through Lighting: A Computer Graphics Perspective." In Advanced RenderMan: Creating CGI for Motion Pictures, edited by Anthony A. Apodaca and Larry Gritz. Morgan Kaufmann.

GPU Gems: Chapter 11. Shadow Map Antialiasing

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Chapter 11. Shadow Map Antialiasing

Michael Bunnell
NVIDIA

Fabio Pellacini
Pixar Animation Studios

11.1 Introduction

Shadow mapping is the method of choice for creating shadows in high-end rendering for motion pictures and television. However, it has been problematic to use shadow mapping in real-time applications, such as video games, because of aliasing problems in the form of magnified jaggies. This chapter shows how to significantly reduce shadow map aliasing in a shader. It describes how to implement a simplified version of percentage-closer filtering that makes the most out of the GPU's shadow-mapping hardware to render soft-edged, antialiased shadows at real-time rates.

Shadow mapping involves projecting a shadow map on geometry and comparing the shadow map values with the light-view depth at each pixel. If the projection causes the shadow map to be magnified, aliasing in the form of large, unsightly jaggies will appear at shadow borders. Aliasing can usually be reduced by using higher-resolution shadow maps and increasing the shadow map resolution, using techniques such as perspective shadow maps (Stamminger and Drettakis 2002). However, using perspective shadow-mapping techniques and increasing shadow map resolution does not work when the light is traveling nearly parallel to the shadowed surface, because the magnification approaches infinity.

High-end rendering software solves the aliasing problem by using a technique called percentage-closer filtering.

11.2 Percentage-Closer Filtering

Unlike normal textures, shadow map textures cannot be prefiltered to remove aliasing. Instead, multiple shadow map comparisons are made per pixel and averaged together. This technique is called percentage-closer filtering (PCF) because it calculates the percentage of the surface that is closer to the light and, therefore, not in shadow.

The original PCF algorithm, described in Reeves et al. 1987, called for mapping the region to be shaded into shadow map space and sampling that region stochastically (that is, randomly). The algorithm was first implemented using the REYES rendering engine, so the region to be shaded meant a four-sided micropolygon. Figure 11-1 shows an example of that implementation.

fig11-01.jpg

Figure 11-1 Percentage-Closer Filtering

In our implementation, we have changed the PCF algorithm slightly to make it easy and efficient to apply. Instead of calculating the region to be shaded in shadow map space, we simply use a 4x4-texel sample region everywhere. This region is large enough to significantly reduce aliasing, but not so large as to require huge numbers of samples or stochastic sampling techniques to achieve good results. Note that the sampling region is not aligned to texel boundaries. An aligned region would not achieve the antialiasing effect that we want.

Hardware shaders work on pixels, not on micropolygons, so matching the original implementation would involve transforming a four-sided polygon representing a screen pixel into shadow map space to calculate the sample region. Our implementation uses a fixed-size sample region instead. A fixed-size region lets us skip a complicated transformation and allows us to calculate a precise shadow percentage instead of an approximate one using stochastic sampling. See Figure 11-2.

fig11-02.jpg

Figure 11-2 Sampling an Area of 4x4 Texels

11.3 A Brute-Force Implementation

NVIDIA GPUs have built-in percentage-closer filtering for shadow map sampling. The hardware does four depth compares and uses the fractional part of the texture coordinate to bilinearly interpolate the shadow value. The shadow result is the percentage that a texel-size sample area is in shadow. See Figure 11-3. A single texel-size sample region is not big enough to effectively remove aliasing, but the region can be increased to a 4x4 texel size by averaging 16 shadow compare values. The offsets for x and y are -1.5, -0.5, 0.5, and 1.5 for samples one texel unit apart.

fig11-03.jpg

Figure 11-3 Using the Hardware

The following function can be used to do a projected texture map read with an offset given in texel units. The variable texmapscale is a float2 containing 1/width and 1/height of the shadow map.

float3 offset_lookup(sampler2D map,
                     float4 loc,
                     float2 offset)
{
return tex2Dproj(map, float4(loc.xy + offset * texmapscale * loc.w,
                               loc.z, loc.w));
}

We can implement the 16-sample version in a fragment program as follows:

float sum = 0;
float x, y;
 
for (y = -1.5; y <= 1.5; y += 1.0)
  for (x = -1.5; x <= 1.5; x += 1.0)
    sum += offset_lookup(shadowmap, shadowCoord, float2(x, y));
 
shadowCoeff = sum / 16.0;

11.4 Using Fewer Samples

The performance of the brute-force method is better than one might expect. Many of the texture fetches are in the texture cache because they are guaranteed to be close to one another. However, if we change the sampling pattern per pixel, we can attain similar results with only four samples per pixel.

The four-sample technique produces results similar to those created by dithering black-and-white data to render a grayscale image. The sample region size remains the same, but we use only four of the 16 samples per pixel. The set of four samples varies depending on the screen location. Figure 11-4 shows the sampling pattern used in the four-sample version of the shader to pick four out of 16 possible sample locations per pixel.

fig11-04.jpg

Figure 11-4 The Sampling Pattern Used for the Four-Sample Version of the Shader

We can implement the four-sample version as follows:

offset = (float)(frac(position.xy * 0.5) > 0.25);  // mod
offset.y += offset.x;  // y ^= x in floating point
 
   if (offset.y > 1.1)
  offset.y = 0;
shadowCoeff = (offset_lookup(shadowmap, sCoord, offset +
                             float2(-1.5, 0.5)) +
               offset_lookup(shadowmap, sCoord, offset +
                             float2(0.5, 0.5)) +
               offset_lookup(shadowmap, sCoord, offset +
                             float2(-1.5, -1.5)) +
               offset_lookup(shadowmap, sCoord, offset +
                             float2(0.5, -1.5)) ) * 0.25;

11.5 Why It Works

How can we antialias shadows with a fixed-size sample region even though texture projection can greatly magnify the shadow map? The answer is simple: When a texture is magnified, the texture map samples are close to each other for adjacent pixels. If the samples are close to each other and the sampled area is relatively large, then there can be only a very small difference between the shadow values, because the sample areas overlap a lot. Figure 11-5 shows how sample areas for adjacent pixels overlap when the shadow map is magnified.

fig11-05.jpg

Figure 11-5 Overlapping Sampling Regions for Adjacent Pixels

The more the shadow map is magnified, the smaller the difference between adjacent pixels, and the smoother the transition between shadowed and unshadowed regions. The hardware calculates that shadow percentage with eight bits of precision, so even in the case of extreme magnification and high-contrast shadows, there will always be a smooth shadow transition without banding. If the shadow regions are very close to each other, the shadow value will differ only by the least significant bit for eight-bits-per-component output. This is illustrated in Figures 11-6, 11-7, and 11-8, which show shadows for a ninja model with 1, 4, and 16 samples, respectively. Figure 11-9 shows a magnification of the ninja's thumb shadow in each of the three cases. Notice the vastly improved shadow quality in the 16-sample case.

fig11-06.jpg

Figure 11-6 Ninja Shadow with One Sample per Pixel

fig11-07.jpg

Figure 11-7 Ninja Shadow with Four Dithered Samples per Pixel

fig11-08.jpg

Figure 11-8 Ninja Shadow with Sixteen Samples per Pixel

fig11-09a.jpg

Figure 11-9 The Shadows Magnified

11.6 Conclusion

Shadow mapping is a popular method for rendering shadows, but it suffers from aliasing artifacts. We can greatly reduce shadow map aliasing by averaging multiple shadow map values. If we take advantage of the GPU's shadow-mapping hardware and use clever sampling techniques, we can render soft-edged, antialiased shadows at high frame rates.

11.7 References

Fernando, Randima, and Mark Kilgard. 2003. The Cg Tutorial. Addison-Wesley. This introduction to the Cg language has a good section on shadow mapping.

Reeves, W. T., D. H. Salesin, and P. L. Cook. 1987. "Rendering Antialiased Shadows with Depth Maps." Computer Graphics 21(4) (Proceedings of SIGGRAPH 87).

Stamminger, Marc, and George Drettakis. 2002. "Perspective Shadow Maps." In Proceedings of SIGGRAPH 2002, pp. 557–562.

GPU Gems: Chapter 12. Omnidirectional Shadow Mapping

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Chapter 12. Omnidirectional Shadow Mapping

Philipp S. Gerasimov
iXBT.com

12.1 Introduction

One of the most difficult problems in real-time computer graphics is generating high-quality shadows. Yet, the appearance of such shadows is one of the most important factors in achieving graphic realism. In computer-generated scenes, an object's shadow enhances our perception of the object and the relationship between objects. In computer games, shadows—along with lighting, music, and special effects—play a very important role in portraying a realistic game atmosphere. For example, shadows are a major part of the story line in id Software's Doom 3, one of the most technologically advanced games. Figures 12-1 and 12-2 show examples of shadows from our own demo, which is provided on the book's CD and Web site.

fig12-01.jpg

Figure 12-1 Screenshot of Our Demo, Showing a Light Source Flying Above a Character

fig12-02.jpg

Figure 12-2 A Close-Up of the Character in Our Demo

GPUs now allow us to create images previously available only in professional 3D offline-rendering programs. The geometry processors in modern GPUs can process millions of primitives per frame, letting us design complex worlds. With the advent of per-pixel shading, we can produce realistic materials using complex mathematical and physically based models of lighting.

Two popular methods are available for visualizing shadows in real-time computer graphics: stencil shadows and shadow mapping.

12.1.1 Stencil Shadows

The stencil shadows method, which is demonstrated in Doom 3, is used widely by game developers. It offers advantages such as the large number of GPUs that support it (the only essential hardware feature is support for an eight-bit stencil buffer), its independence from the type of light source, and the high quality of its generated shadows. However, the stencil shadows approach has some serious disadvantages: it's heavily dependent on CPU work, it can produce only hard shadows, it uses a large amount of fill rate (which means that even though a GPU may support the technique, it could run poorly), and it cannot be used with hardware-tessellated surfaces.

12.1.2 Shadow Mapping

The shadow-mapping algorithm came to computer graphics in 1978 when it was introduced by Lance Williams (Williams 1978). Today, this method is used in a multitude of Hollywood movies that contain computer graphics and special effects. Shadow mapping projects a special dynamically created texture on scene geometry to calculate shadows. It lets you render hard and soft shadows, as well as shadows from different types of light sources. Plus, it works with hardware-tessellated surfaces and with GPU-animated meshes (such as skinned meshes).

A number of GPU manufacturers, including NVIDIA, support shadow mapping directly in their hardware and promise to enhance this support in the future. The NVIDIA GeForce3, GeForce4 Ti, and all the GeForce FX (and more recent) GPUs support hardware shadow maps through both DirectX and OpenGL. (However, we do not use the native hardware shadow-mapping functionality in this chapter.) The possibilities offered by the NVIDIA CineFX architecture—including support for long fragment programs with true floating-point precision as well as floating-point texture formats—enable a new level of quality in shadow rendering.

12.2 The Shadow-Mapping Algorithm

12.2.1 Conditions

Shadow mapping lets us visualize shadows cast from different types of light sources, such as directional lights, point lights, and spotlights. The type of light source dictates the technology we need to use. This chapter focuses on visualizing shadows cast from point light sources. Point light sources are widely used in computer games, and the quality of shadows cast by objects illuminated by these lights is very important.

We also have these additional conditions:

12.2.2 The Algorithm

There are two primary phases in using omnidirectional shadow maps: creating the shadow map and projecting it. In the creation phase, we render the squared distance from the light source of all objects that cast shadows into the shadow map texture (we'll see why the distance is squared a little later). In the projection phase, we render all the objects that receive shadows, and we compare the squared distance from the rendered pixel to the light source.

The following technique fills all six faces of a cube map, in all directions: +x, -x, +y, -y, +z, -z. The shadow maps can be either precalculated (for static scenes) or re-rendered every frame. We focus primarily on re-rendering the shadow map each frame for fully dynamic shadows. All objects cast a shadow, and receive a shadow, from each light source. And all objects self-shadow. We use a single shadow map for all light sources, creating an image with multipass rendering and performing one pass for each light source.

Listing 12-1 is an example of pseudocode for this algorithm.

Because we use a multipass algorithm (that is, making one pass for each light source), all objects must be composited into the frame buffer. To reduce overdraw and improve performance, we render a depth-only pass first. This standard technique ensures that all subsequent lighting passes occur only on visible pixels. Rendering to depth-only is very fast (many GeForce FX GPUs have double-speed "depth-only" rendering features), so it requires minimal overhead, even in low-overdraw situations. Transparent objects are not rendered in the depth-only pass, because transparent objects do not update the depth buffer. See Listing 12-2.

Example 12-1. Pseudocode for the Omnidirectional Shadow-Mapping Algorithm

for (iLight = 0; iLight < NumberOfLights; iLight++) {
  // Fill the shadow map.
  
   for (iObject = 0; iObject < NumberOfObjects; iObject++) {
    RenderObjectToShadowMap(iLight, iObject);
  }
 
 
  // Lighting and shadow mapping.
  
   for (iObject = 0; iObject < NumberOfObjects; iObject++) {
    LightAndShadeObject (iLight, iObject);
  }
}

Example 12-2. Depth-Only Rendering

// Clear color and depth buffers
ClearAllBuffers();
 
 
// Fill z-buffer
 
   for (iObject = 0; iObject < NumberOfObjects; iObject++) {
  RenderObjectToZBufferOnly (iObject);
}

12.2.3 Texture Format

The type of texture format used is an important factor in this algorithm. We consider two formats: floating-point textures and integer 32-bit RGBA textures with packing/unpacking of the depth value into the color channels.

The floating-point texture format is ideal for shadow mapping because it allows for high-precision depth values. However, these textures are much slower than integer RGBA textures and are supported by only a limited number of GPUs. On the other hand, integer 32-bit RGBA textures are fast and are supported by most 3D hardware.

To conserve the high precision of calculation, however, we must pack depth values into the color channels of textures and unpack each value when performing the depth-compare for shadow mapping. We consider both methods and let you choose the one that's more convenient.

12.2.4 The Size of the Shadow Map

The size of the shadow map influences the shadow's quality and rendering speed. The size depends on the capabilities of the target hardware, the required quality, and the position of the shadow in relationship to the camera. Of course, a larger shadow map generally produces better results.

Because we use cube map textures, we have to keep in mind that we have six color surfaces and an additional z-buffer. For 32-bit textures and a 1024x1024 resolution, we'll need 4 x (6 + 1) x 1024 x 1024 bytes of video memory, or 28 MB! This highlights the importance of using a single shadow map for all light sources.

Section 12.3 examines each step of our algorithm.

12.2.5 The Range of Values for Geometry

To minimize rendering artifacts, we put all our geometry into a -0.5...+0.5 range (or 0..1). This adds accuracy to our calculations, especially if we use 16-bit precision and integer textures. We can scale our geometry at load time or in the vertex shader, using vertex shader code such as this:

o.vPositionWorld = mul(vPosition, matWorld) *  fGeometryScale;

12.3 Implementation

12.3.1 System Requirements

These are our system requirements:

12.3.2 Resource Creation

We can create all the required objects and textures (the shadow map texture, the depth buffer, and the shaders) using several useful Direct3D library functions:

D3DXCreateCubeTexture()
D3DXCreateRenderToEnvMap()
D3DXCreateEffectFromFile()

12.3.3 Rendering Phase 1: Rendering into the Shadow Map

Next, we render into the shadow map. We'll render our objects into each face of the cube map from the point of view of the light source, following these requirements:

The Vertex Shader

In the vertex shader, we write out the scaled world-space position of the vertex for the pixel shader. Or, we can write out the light direction and save one pixel shader instruction computing the world-space light vector.

The Pixel Shader

We can use either a floating-point texture or an integer texture.

By writing frac(fDepth) into the green and alpha channels, we save this pixel shader instruction (otherwise, we need an additional instruction to fill these channels):

mov r2.gba, r0.g  // r0.g contains frac(fDepth)

Method 1 is computationally cheaper, but the second one gives you higher precision.

12.3.4 Rendering Phase 2: Base Rendering

The base rendering phase has two main parts:

  1. Rendering objects only to the z-buffer (z-only pass), which requires these steps:
    1. Disabling rendering into the color channel
    2. Enabling rendering into the z-buffer
    3. Rendering all objects into the z-buffer (only)
  2. Making a shading (lighting times shadow) pass for each light source

12.3.5 The Lighting Calculation

We need to calculate the lighting at each pixel from the light source, and we can use any lighting model (such as per-pixel Phong, Blinn, or Oren-Nayar).

12.3.6 The Shadow Calculation

Calculating the shadow requires these steps:

  1. Calculate the squared distance from the current pixel to the light source.
  2. Project the shadow map texture onto the current pixel.
  3. Fetch the shadow map texture value at the current pixel.
  4. Compare the calculated distance value with the fetched shadow map value to determine whether or not we're in shadow.

For floating-point textures, we just use the x component of the fetched texture sample.

Here is the pixel shader code:

float fDepth = fDistSquared - fDepthBias;
float3 vShadowSample = texCUBE(ShadowMapSampler, -vLight.xyz);
float fShadow = (fDepth - vShadowSample.x < 0.0f) ? 1.0f : 0.0f;

fDistSquared was computed previously in the pixel shader. For integer textures, we must unpack the value from the color channels of the fetched texture sample.

  1. DepthValue = ShadowSample.r / 1 +
    ShadowSample.g / 256 +
    ShadowSample.b / 65536 +
    ShadowSample.a / 16777216
  2. DepthValue = ShadowSample.r * 256 + ShadowSample.g

Here is the pixel shader code:

float fDepth = fDistSquared - fDepthBias;
float4 vShadowSample = texCUBE(ShadowMapSampler, -vLight.xyz);
float fShadow = (fDepth - dot(vShadowSample,
                              vUnpack) < 0.0f) ? 1.0f : 0.0f;

12.3.7 Tips and Tricks

  1. There are a number of different ways you can compute depth bias:
    • fDistSquared - vShadowSample.x—artifacts are very possible.
    • (fDistSquared - DepthBias) - vShadowSample.x—the squared distance is not linear.
    • (fDistSquared * DepthBias) - vShadowSample.x—this method works best in practice.
  2. Light direction: Move the light direction calculation into the vertex shader. The light direction is linear and can easily be calculated per vertex.
  3. Opposite light direction: We need the opposite light direction for fetching from the shadow map. But the texld pixel shader instruction does not support the "negate" modifier, so if we use texCUBE(ShadowMapSampler, -vLight.xyz), we'll get an extra "add" instruction with every fetch. So, we can move this calculation into the vertex shader and interpolate -vLight.xyz instead of vLight.xyz.
  4. Preprocessor directives with HLSL and Cg shaders: Use preprocessor directives for different options—such as floating-point/integer textures, hard/soft shadows, and full/half/fixed precision—to reduce the number of shaders you need to write.
  5. Pixel shader precision: Use half precision for most shadow calculations. It's sufficient, and you will get extra speed on some hardware. If you see artifacts, however, use full precision.

12.3.8 Finalizing the Shading Pass (Lighting x Shadow)

The last step is to write the pixel color value based on the calculated lighting and shadowing. For each light source, we add the calculated lighting into the back buffer by repeating the shadow-writing and shading passes for all objects. When we finish processing all the light sources, we get a scene with dynamic lighting and shadowing.

12.4 Adding Soft Shadows

Looking at our scene, we notice that the shadows' edges are aliased and "hard." The level of aliasing depends on the size of the shadow map and the amount of magnification during projection. To reduce the appearance of these artifacts, we create a "softer" shadow by fetching multiple samples from the shadow map and averaging the results. Because real-world light sources are rarely perfect point sources, this action will also provide a more realistic shadow.

Listing 12-3 shows some sample code.

Example 12-3. Making a Softer Shadow

float fShadow = 0;
 
 
for (int i = 0; i < 4; i++) {
  float3 vLightDirection = -vLight.xyz + vFilter[i];
  float4 vShadowSample = texCUBE(ShadowMapSampler, vLightDirection);
  fShadow += (fDepth - vShadowSample.x < 0.0f) ? 0.25f : 0.0f;
}

Note that we first compare the squared distances and then average the results of the comparison. This is called percentage-closer filtering and is the correct way to average multiple shadow map tests. (See Chapter 11 of this book, "Shadow Map Antialiasing," for a detailed discussion of this technique.)

We can save some pixel shader instructions when calculating -vLight.xyz + vFilter[i] values if we move it into the vertex shader.

If we choose different range values for vFilter[i], we'll get different levels of softness for the shadow, ranging from a slight antialiasing effect to a very blurry shadow. The larger the filter kernel we define, the more samples we need to take to avoid banding artifacts. Obviously, taking more samples equals processing more instructions and more texture fetches, which can reduce performance in shader-bound situations. Although this technique can produce a "softer" look for the shadows, the shadows are of course not accurate soft shadows, because they do not take into account the relationships between occluders, receivers, and the size of the light source. (See Chapter 13, "Generating Soft Shadows Using Occlusion Interval Maps," for more on soft shadows.)

12.5 Conclusion

With the new capabilities of DirectX 9–class hardware, new algorithms for improving visual quality become possible and easier to implement. Using hardware shaders, we can create realistic, dynamic shadows from any number of point light sources, and we can even implement basic "soft" shadows.

With the current first-generation DirectX 9–class hardware, this algorithm is not quite fast enough to be practical (although it is definitely real time). That's because of the large number of renderings from the point of view of the light, and the long pixel shaders necessary for "soft" shadowing effects. But as always, much faster graphics hardware is right around the corner, and advances in performance will make these algorithms practical for implementation in real, shipping games.

12.6 References

Williams, Lance. 1978. "Casting Curved Shadows on Curved Surfaces." In Proceedings of the 5th Annual Conference on Computer Graphics and Interactive Techniques, pp. 270–274.

The author would like to thank Chris Wynn and John Spitzer of NVIDIA and Guennadi Riguer of ATI.

GPU Gems: Chapter 13. Generating Soft Shadows Using Occlusion Interval Maps

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Chapter 13. Generating Soft Shadows Using Occlusion Interval Maps

William Donnelly
University of Waterloo

Joe Demers
NVIDIA

In this chapter we present a technique for rendering soft shadows that we call occlusion interval mapping. Occlusion interval maps were used to produce soft shadows in the NVIDIA GeForce FX 5900 demo "Last Chance Gas." See Figure 13-1. We call the technique occlusion interval mapping because it uses texture maps to store intervals that represent when the light source is visible and when it is occluded. In situations that satisfy the algorithm's requirements, occlusion interval mapping allows you to achieve impressive visual results at high frame rates.

fig13-01.jpg

Figure 13-1 The "Last Chance Gas" Demo

13.1 The Gas Station

One of the goals of the GeForce FX 5900 demo "Last Chance Gas" was to create a scene with accurate outdoor lighting. One important aspect of outdoor lighting we wanted to capture is soft shadows from the Sun. Unlike the hard shadows produced by shadow maps or stencil shadow volumes, soft shadows have a penumbra region, which gives a smooth transition between shadowed and unshadowed regions. A correct penumbra makes for more realistic shadowing and gives the user a better sense of spatial relationships, making for a more realistic and immersive experience.

When we started writing the demo, we could not find any appropriate real-time soft shadow algorithm, even though a lot of research had been dedicated to the problem. Because the soft shadow problem is such a difficult one, we considered how we could simplify the problem to make it more feasible for real time.

Given that the gas station scene is static, we considered using a precomputed visibility technique such as spherical harmonic lighting (Sloan et al. 2002). Unfortunately for our purposes, spherical harmonic lighting assumes very low frequency lighting, and so it is not suitable for small area lights such as the Sun. We developed occlusion interval maps as a new precomputed visibility technique that would allow for real-time soft shadows from the Sun. Our method achieves this goal by reducing the problem to the case of a linear light source on a fixed trajectory.

The algorithm also bears some similarity to horizon maps (Max 1988). Unlike horizon maps, which cover the entire visible hemisphere, occlusion interval maps work only for lights along a single path on the visible hemisphere. Occlusion interval maps also can handle arbitrary geometry, not just height fields.

Occlusion interval maps are not meant to be a general solution to the soft shadow problem, but we found them useful for shadowing in a static outdoor environment. As with other precomputed visibility techniques, occlusion interval maps rely on an offline process to store all visibility information, and so they won't work for moving objects or arbitrary light sources.

13.2 The Algorithm

Suppose we have a light source such as the Sun that follows a fixed trajectory, and suppose that we want to precompute hard shadows for this light source. We can express the shadowing as a visibility function, which has a value of 0 when a point is in shadow and a value of 1 when it is illuminated. During rendering, this visibility function is computed, and the result is multiplied by a shading calculation to give the final color value.

The visibility function is a function of three variables: two spatial dimensions for the surface of the object and one dimension for time. Although we could store this function as a 3D texture, the memory requirements would be huge. Instead, because all the values of the visibility function are either 0 or 1, we can store the function using a method similar to run-length encoding. For each point, we find rising and falling edges in the time domain. These correspond to the times of day when the Sun appears and disappears, respectively. We define the "rise" vector as the vector of all rising edges, and the corresponding "fall" vector as the vector of all falling edges. See Figure 13-2.

fig13-02.jpg

Figure 13-2 A Single Point in the Scene and Its Visibility Function

We now have all we need for precomputed visibility of hard shadows; given a rise vector, a fall vector, and time of day, we can compute the visibility function to determine if the point is in shadow.

In order to turn this into a soft shadow algorithm, we extend the light source along its trajectory. Now instead of computing shadows from a point light source, we compute shadows from a linear light source. Imagine a time interval ch13_eqn001.jpg . Over this time, the light source will sweep out a curve in space. If we take the average lighting over the time interval ch13_eqn002.jpg , we will have computed the correct shadowing from the linear light source. This means that we can apply the same information used to render a hard shadow image to rendering a soft shadowed image. See Figure 13-3.

fig13-03.jpg

Figure 13-3 The Point Light Visibility Function and Corresponding Linear Light Visibility Function

13.3 Creating the Maps

In order to generate occlusion interval maps, we have to compute the visibility function from every point on the light source trajectory to every pixel in the occlusion interval map. We do this by taking a sequence of evenly spaced points along the curve, tracing a ray from each occlusion interval map pixel to each of these points, and detecting the rising and falling edges of each visibility function. We store rising and falling edge values in eight bits; so 256 rays are enough to completely capture all of the intervals. To reduce the amount of information stored, we do not store rising and falling edges when a point's normal is facing away from the light source. Because back-facing pixels will be dark anyway, this decision will save space and have no effect on the rendered image.

Computing visibility functions can be done by any ray tracer with the right level of programmability. We computed all data for our scene using a custom shader in the mental ray software package. Computing these textures can be time-consuming, because you have to cast 256 rays for every pixel in the occlusion interval map. For the scene in "Last Chance Gas," it took several hours to compute the shadowing for the entire scene.

We store the rise vector and the fall vector in two sets of color textures, each texture having four channels. A "rise" texture stores the beginning of a light interval in each channel, and the matching "fall" texture stores the ends of the light intervals. It will become obvious why we divide the textures up like this when we describe the algorithm for rendering with occlusion interval maps.

To alleviate the extra memory requirements of storing the occlusion interval maps, we compute our maps at half the size of color textures, which reduces the storage requirements by a factor of four. The resolution of occlusion interval maps can be reduced because the softness of the shadows has a blurring effect that makes up for the lower resolution. In some cases, we found that we could even lower the resolution of the maps beyond half the color texture sizes without noticeable artifacts.

For parameterizing the objects, we use the objects' texture coordinates. Because the pixels of the occlusion interval map store information that depends on position, the objects' texture coordinates cannot overlap. For objects with tiled textures, this means computing a new set of unique texture coordinates.

13.4 Rendering

When rendering, we have to average the visibility function over an interval of parameter values. In mathematical terms, this means performing a convolution with a window function of width dt. The equation for this calculation is:

ch13_eqn003.jpg

where VPointLight is the visibility function and Wdt is a box filter of width dt, defined as Wdt (t) = 1/dt for -dt/2 < t < dt/2, and Wdt (t) = 0 otherwise. Given a rise vector R = (R 1, R 2,...,Rn ) and a fall vector F = (F 1, F 2,...,Fn ), then we can express VPointLight as:

ch13_eqn010.jpg

where B(a, b, t) is the boxcar function, defined as B(a, b, t) = 1 for a < t < b and B(a, b, t) = 0 otherwise. We can now evaluate VLinearLight as follows:

ch13_eqn012.jpg

Using the preceding equation, we can easily calculate soft shadowing for a single rise/fall pair using just min, max, and subtraction. Fortunately, we optimize this even further. Because shader instructions operate on four-component vectors, four intervals can be done simultaneously at the same cost of doing a single interval. This is why we pack the rises and falls into separate textures. The final Cg code is shown in Listing 13-1.

Example 13-1. Function for Computing Soft Shadows Using Occlusion Interval Maps

half softshadow(sampler2D riseTexture,
                sampler2D fallTexture,
                float2 texCoord,
                half intervalStart,
                half intervalEnd,
                half intervalInverseWidth)
{

  half4 rise = h4tex2D (riseTexture, texCoord);
  half4 fall = h4tex2D (fallTexture, texCoord);
  half4 minTerm = min (fall, intervalEnd);
  half4 maxTerm = max (rise, intervalStart);
  return dot (intervalInverseWidth, saturate (minTerm - maxTerm));
}

Note that saturate(x) is used in place of max(0, x). The two operations will always be equivalent because the quantity being considered is the width of the visible light source interval, which is always less than 1. We choose to use saturate(x) over max(0, x) because it can be applied as an output modifier, saving an instruction. We used the 16-bit half data type and found it perfectly suited to our needs, because our calculations exceeded the range of fixed precision but did not require full 32-bit floating point.

The dot product on the last line of Listing 13-1 is not used for its usual geometric purpose; we use it to simultaneously divide by the light source width and add together the shadow values that are computed in parallel. We pass the values to the shader as intervalStart = t - 1/2dt, intervalEnd = t + 1/2dt, and intervalInverseWidth = 1/dt.

This shader compiles to only six assembly-code instructions for the GeForce FX: two texture lookups, a min, a max, a subtraction with a saturate modifier, and a dot product. The function computes up to four intervals' worth of shadows. If there are multiple rise and fall textures, we just call the function multiple times and add the results together.

13.5 Limitations

As previously discussed, the technique works only for static scenes with a single light traveling on a fixed trajectory. This means it would not work for shadowing on characters and other dynamic objects, but it is well suited for shadowing in static outdoor environments.

Because occlusion interval maps require all eight bits of precision per channel, texture compression will result in visual artifacts. Thus, texture compression has to be disabled, resulting in increased texture usage. This increase is offset by the lower resolution of the occlusion interval maps. The discontinuities in the occlusion interval maps mean that bilinear filtering produced artifacts as well. As a result, any kind of texture filtering must also be disabled on occlusion interval maps. This gives the shadows a blocky look. Once again, because of the smoothness of the shadowing, this effect is not as noticeable as it would be on detailed color textures.

Another visual artifact comes from the fact that the Sun is approximated by a linear light source. If you look closely at the shadow boundaries, you will see that shadows are smoother in the direction parallel to the light source path and harder in the perpendicular direction. Fortunately, this effect is subtle unless the light source is very large. For the range of widths we used, the effect is not very noticeable. Heidrich et al. (2000) also used linear lights to approximate area lights and noted that the shadowing from a linear light source looks very much like the shadowing from a true area light source. See Figure 13-4.

fig13-04.jpg

Figure 13-4 Lighting a Ladder

13.6 Conclusion

Rendering soft shadows in real time is an extremely difficult problem. Figures 13-5 and 13-6 show two examples of the subtleties involved. Precomputed visibility techniques produce soft shadows by imposing assumptions on the scene and on the light source. In the case of occlusion interval maps, we trade generality for performance to obtain a soft shadow algorithm that runs in real time on static scenes. Occlusion interval maps can act as a replacement for static light maps, allowing dynamic effects such as the variation of lighting from sunrise to sunset, as in "Last Chance Gas."

fig13-05.jpg

Figure 13-5 The Gas Station Entrance

fig13-06.jpg

Figure 13-6 Shadows on the Car's Tarp

13.7 References

We also considered using the soft shadow volume technique presented by Assarsson et al., but we had too much geometry in our scene for a shadow volume technique to remain real-time.

Assarsson, Ulf, Michael Dougherty, Michael Mounier, and Tomas Akenine-Möller. 2003. "An Optimized Soft Shadow Volume Algorithm with Real-Time Performance." Graphics Hardware, pp. 33–40.

Heidrich, Wolfgang, Stefan Brabec, and Hans-Peter Seidel. 2000. "Soft Shadow Maps for Linear Lights." In 11th Eurographics Workshop on Rendering, pp. 269–280.

Max, N. L. 1988. "Horizon Mapping: Shadows for Bump-Mapped Surfaces. The Visual Computer 4(2), pp. 109–117.

Sloan, Peter-Pike, Jan Kautz, and John Snyder. 2002. "Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments." ACM Transactions on Graphics 21, pp. 527–536.

GPU Gems: Chapter 14. Perspective Shadow Maps: Care and Feeding

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Chapter 14. Perspective Shadow Maps: Care and Feeding

Simon Kozlov
SoftLab-NSK

14.1 Introduction

Shadow generation has always been a big problem in real-time 3D graphics. Determining whether a point is in shadow is not a trivial operation for modern GPUs, particularly because GPUs work in terms of rasterizing polygons instead of ray tracing.

Today's shadows should be completely dynamic. Almost every object in the scene should cast and receive shadows, there should be self-shadowing, and every object should have soft shadows. Only two algorithms can satisfy these requirements: shadow volumes (or stencil shadows) and shadow mapping.

The difference between the algorithms for shadow volumes and shadow mapping comes down to object space versus image space:

Perspective shadow maps (PSMs), presented at SIGGRAPH 2002 by Stamminger and Drettakis (2002), try to eliminate aliasing in shadow maps by using them in post-projective space, where all nearby objects become larger than farther ones. Unfortunately, it's difficult to use the original algorithm because it works well only in certain cases.

The most significant problems of the presented PSM algorithm are these three:

Each of these problems is discussed in the next section. This chapter focuses on directional lights (because they have bigger aliasing problems), but all the ideas and algorithms can easily be applied to other types of light source (details are provided, where appropriate). In addition, we discuss tricks for increasing the quality of the shadow map by filtering and blurring the picture.

In general, this chapter describes techniques and methods that can increase the effectiveness of using PSMs. However, most of these ideas still should be adapted to your particular needs.

14.2 Problems with the PSM Algorithm

14.2.1 Virtual Cameras

First, let's look at the essence of this problem. The usual projective transform moves objects behind the camera to the other side of the infinity plane in post-projective space. However, if the light source is behind the camera too, these objects are potential shadow casters and should be drawn into the shadow map.

In the perspective transform in Figure 14-1, the order of the points on the ray changes. The authors of the original PSM paper propose "virtually" sliding the view camera back to hold potential shadow casters in the viewing frustum, as shown in Figure 14-2, so that we can use PSMs the normal way.

fig14-01.jpg

Figure 14-1 An Object Behind the Camera in Post-Projective Space

fig14-02.jpg

Figure 14-2 Using a Virtual Camera

Virtual Camera Issues

In practice, however, using the virtual camera leads to poor shadow quality. The "virtual" shift greatly decreases the resolution of the effective shadow map, so that objects near the real camera become smaller, and we end up with a lot of unused space in the shadow map. In addition, we may have to move the camera back significantly for large shadow-casting objects behind the camera. Figure 14-3 shows how dramatically the quality changes, even with a small shift.

fig14-03a.jpg

Figure 14-3 The Effect of the Virtual Shift on Shadow Quality

Another problem is minimizing the actual "slideback distance," which maximizes image quality. This requires us to analyze the scene, find potential shadow casters, and so on. Of course, we could use bounding volumes, scene hierarchical organizations, and similar techniques, but they would be a significant CPU hit. Moreover, we'll always have abrupt changes in shadow quality when an object stops being a potential shadow caster. In this case, the slideback distance instantly changes, causing the shadow quality to change suddenly as well.

A Solution for Virtual Camera Issues

We propose a solution to this virtual camera problem: Use a special projection transform for the light matrix. In fact, post-projective space allows some projection tricks that can't be done in the usual world space. It turns out that we can build a special projection matrix that can see "farther than infinity."

Let's look at a post-projective space formed by an original (nonvirtual) camera with a directional "inverse" light source and with objects behind the view camera, as shown in Figure 14-4.

fig14-04.jpg

Figure 14-4 Post-Projective Space with an Inverse Light Source

A drawback to this solution is that the ray should (but doesn't) come out from the light source, catch point 1, go to minus infinity, then pass on to plus infinity and return to the light source, capturing information at points 2, 3, and 4. Fortunately, there is a projection matrix that matches this "impossible" ray, where we can set the near plane to a negative value and the far plane to a positive value. See Figure 14-5.

fig14-05.jpg

Figure 14-5 An Inverse Projection Matrix

In the simplest case,

where a is small enough to fit the entire unit cube. Then we build this inverse projection as the usual projection matrix, as shown here, where matrices are written in a row-major style:

So the formula for the resulting transformed z coordinates, which go into a shadow map, is:

Z psm (-a) = 0 , and if we keep decreasing the z value to minus infinity, Z psm tends to ½. The same Z psm = ½ corresponds to plus infinity, and moving from plus infinity to the far plane increases Z psm to 1 at the far plane. This is why the ray hits all points in the correct order and why there's no need to use "virtual slides" for creating post-projective space.

This trick works only in post-projective space because normally all points behind the infinity plane have w < 0, so they cannot be rasterized. But for another projection transformation caused by a light camera, these points are located behind the camera, so the w coordinate is inverted again and becomes positive.

By using this inverse projection matrix, we don't have to use virtual cameras. As a result, we get much better shadow quality without any CPU scene analysis and the associated artifacts.

The only drawback to the inverse projection matrix is that we need a better shadow map depth-value precision, because we use big z-value ranges. However, 24-bit fixed-point depth values are enough for reasonable cases.

Virtual cameras still could be useful, though, because the shadow quality depends on the location of the camera's near plane. The formula for post-projective z is:

As we can see, Q is very close to 1 and doesn't change significantly as long as Z n is much smaller than Z f , which is typical. That's why the near and far planes have to be changed significantly to affect the Q value, which usually is not possible. At the same time, near-plane values highly influence the post-projective space. For example, for Z n = 1 meter (m), the first meter in the world space after the near plane occupies half the unit cube in post-projective space. In this respect, if we change Z n to 2 m, we will effectively double the z-value resolution and increase the shadow quality. That means that we should maximize the Z n value by any means.

The perfect method, proposed in the original PSM article, is to read back the depth buffer, scan through each pixel, and find the maximum possible Z n for each frame. Unfortunately, this method is quite expensive: it requires reading back a large amount of video memory, causes an additional CPU/GPU stall, and doesn't work well with swizzled and hierarchical depth buffers. So we should use another (perhaps less accurate) method to find a suitable near-plane value for PSM rendering.

Such other methods for finding a suitable near-plane value for PSM rendering could include various methods of CPU scene analysis:

These methods try to increase the actual near-plane value, but we could also increase the value "virtually." The idea is the same as with the old virtual cameras, but with one difference. When sliding the camera back, we increase the near-plane value so that the near-plane quads of the original and virtual cameras remain on the same plane. See Figure 14-6.

fig14-06.jpg

Figure 14-6 Difference Between Virtual Cameras

When we slide the virtual camera back, we improve the z-values resolution. However, this makes the value distribution for x and y values worse for near objects, thus balancing shadow quality near and far from the camera. Because of the very irregular z-value distribution in post-projective space and the large influence of the near-plane value, this balance could not be achieved without this "virtual" slideback. The usual problem of shadows looking great near the camera but having poor quality on distant objects is the typical result of unbalanced shadow map texel area distribution.

14.2.2 The Light Camera

Another problem with PSMs is that the shadow quality relies on the relationship between the light and camera positions. With a vertical directional light, aliasing problems are completely removed, but when light is directed toward the camera and is close to head-on, there is significant shadow map aliasing.

We're trying to hold the entire unit cube in a single shadow map texture, so we have to make the light's field of view as large as necessary to fit the entire cube. This in turn means that the objects close to the near plane won't receive enough texture samples. See Figure 14-7.

fig14-07.jpg

Figure 14-7 The Light Camera with a Low Light Angle

The closer the light source is to the unit cube, the poorer the quality. As we know,

so for large outdoor scenes that have Z n = 1 and Z f = 4000, Q = 1.0002, which means that the light source is extremely close to the unit cube. The Zf /Zn correlation is usually bigger than 50, which corresponds to Q = 1.02, which is close enough to create problems.

We'll always have problems fitting the entire unit cube into a single shadow map texture. Two solutions each tackle one part of the problem: Unit cube clipping targets the light camera only on the necessary part of the unit cube, and the cube map approach uses multiple textures to store depth information.

Unit Cube Clipping

This optimization relies on the fact that we need shadow map information only on actual objects, and the volume occupied by these objects is usually much smaller than the whole view frustum volume (especially close to the far plane). That's why if we tune the light camera to hold real objects only (not the entire unit cube), we'll receive better quality. Of course, we should tune the camera using a simplified scene structure, such as bounding volumes.

Cube clipping was mentioned in the original PSM article, but it took into account all objects in a scene, including shadow casters in the view frustum and potential shadow casters outside the frustum for constructing the virtual camera. Because we don't need virtual cameras anymore, we can focus the light camera on shadow receivers only, which is more efficient. See Figure 14-8. Still, we should choose near and far clip-plane values for the light camera in post-projective space to hold all shadow casters in the shadow map. But it doesn't influence shadow quality because it doesn't change the texel area distribution.

fig14-08.jpg

Figure 14-8 Focusing the Light Camera Based on the Bounding Volumes of Shadow Receivers

Because faraway parts of these bounding volumes contract greatly in post-projective space, the light camera's field of view doesn't become very large, even with light sources that are close to the rest of the scene.

In practice, we can use rough bounding volumes to retain sufficient quality—we just need to indicate generally which part of the scene we are interested in. In outdoor scenes, it's the approximate height of objects on the landscape; in indoor scenes, it's a bounding volume of the current room, and so on.

We'd like to formalize the algorithm of computing the light camera focused on shadow receivers in the scene after we build a set of bounding volumes roughly describing the scene. In fact, the light camera is given by position, direction, up vector, and projection parameters, most of which are predefined:

So the most interesting thing is choosing the light camera direction based on bounding volumes. The proposed algorithm is this:

  1. Compute the vertex list of constructive solid geometry operation, where B i is the ith bounding volume, F is the frustum for every shadow caster bounding volume that we see in the current frame, and all these operations are performed in a view camera space. Then transform all these vertices into post-projective space. After this step, we have all the points that the light camera should "see." (By the way, we should find a good near-plane value based on these points, because reading back the depth buffer isn't a good solution.)
  2. Find a light camera. As we already know, this means finding the best light camera direction, because all other parameters are easily computed for a given direction. We propose approximating the optimal direction by the axis of the minimal cone, centered in the light source and including all the points in the list. The algorithm that finds the optimal cone for a set of points works in linear time, and it is similar to an algorithm that finds the smallest bounding sphere for a set of points in linear time (Gartner 1999).

In this way, we could find an optimal light camera in linear time depending on the bounding volume number, which isn't very large because we need only rough information about the scene structure.

This algorithm is efficient for direct lights in large outdoor scenes. The shadow quality is almost independent of the light angle and slightly decreases if light is directed toward the camera. Figure 14-9 shows the difference between using unit cube clipping and not using it.

fig14-09a.jpg

Figure 14-9 Unit Cube Clipping

Using Cube Maps

Though cube clipping is efficient in some cases, other times it's difficult to use. For example, we might have a densely filled unit cube (which is common), or we may not want to use bounding volumes at all. Plus, cube clipping does not work with point lights.

A more general method is to use a cube map texture for shadow mapping. Most light sources become point lights in post-projective space, and it's natural to use cube maps for shadow mapping with point light sources. But in post-projective space, things change slightly and we should use cube maps differently because we need to store information about the unit cube only.

The proposed solution is to use unit cube faces that are back facing, with respect to the light, as platforms for cube-map-face textures.

For a direct light source in post-projective space, the cube map looks like Figure 14-10.

fig14-10.jpg

Figure 14-10 Using a Cube Map for Direct Lights

The number of used cube map faces (ranging from three to five) depends on the position of the light. We use the maximum number of faces when the light is close to the rest of the scene and directed toward the camera, so additional texture resources are necessary. For other types of light sources located outside the unit cube, the pictures will be similar.

For a point light located inside the unit cube, we should use all six cube map faces, but they're still focused on unit cube faces. See Figure 14-11.

fig14-11.jpg

Figure 14-11 Using Cube Maps with a Point Light

We could say we form a "cube map with displaced center," which is similar to a normal cube map, but with a constant vector added to its texture coordinates. In other words, texture coordinates for cube maps are vertex positions in post-projective space shifted by the light source position:

Texture coordinates = vertex position - light position

By choosing unit cube faces as the cube map platform, we distribute the texture area proportionally to the screen size and ensure that shadow quality doesn't depend on the light and camera positions. In fact, texel size in post-projective space is in a guaranteed range, so its projection on the screen depends only on the plane it's projected onto. This projection doesn't stretch texels much, so the texel size on the screen is within guaranteed bounds also.

Because the vertex and pixel shaders are relatively short when rendering the shadow map, what matters is the pure fill rate for the back buffer and the depth shadow map buffer. So there's almost no difference between drawing a single shadow map and drawing a cube map with the same total texture size (with good occlusion culling, though). The cube map approach has better quality with the same total texture size as a single texture. The difference is the cost of the render target switch and the additional instructions to compute cube map texture coordinates in the vertex and pixel shaders.

Let's see how to compute these texture coordinates. First, consider the picture shown in Figure 14-12. The blue square is our unit cube, P is the light source point, and V is the point for which we're generating texture coordinates. We render all six cube map faces in separate passes for the shadow map; the near plane for each pass is shown in green. They're forming another small cube, so Z 1 = Z n /Z f is constant for every pass.

fig14-12.jpg

Figure 14-12 A Detailed Cube Map View in Post-Projective Space

Now we should compute texture coordinates and depth values to compare for the point V. This just means that we should move this point in the (V - P ) direction until we intersect the cube. Consider d 1, d 2, d 3, d 4, d 5, and d 6 (see the face numbers in Figure 14-12) as the distances from P to each cube map face.

The point on the cube we are looking for (which is also the cube map texture coordinate) is:

Compare the value in the texture against the result of the projective transform of the a value. Because we already divided it by the corresponding d value, thus effectively making Z f = 1 and Z n = Z 1, all we have to do is apply that projective transform. Note that in the case of the inverse camera projection from Section 14.2.1, Z n = -Z 1, Z f = Z 1.

(All these calculations are made in OpenGL-like coordinates, where the unit cube is actually a unit cube. In Direct3D, the unit cube is half the size, because the z coordinate is in the [0..1] range.)

Listing 14-1 is an example of how the shader code might look.

Example 14-1. Shader Code for Computing Cube Map Texture Coordinates

// c[d1] = 1/d1, 1/d2, 1/d3, 0
 
// c[d2] = -1/d4, -1/d5, -1/d6, 0
 
// c[z] = Q, -Q * Zn, 0, 0
 
// c[P] = P
 
// r[V] = V
 
// cbmcoord - output cube map texture coordinates
 
// depth - depth to compare with shadow map values
 
 
//Per-vertex level
 
sub r[VP], r[V], c[P]
mul r1, r[VP], c[d1]
mul r2, r[VP], c[d2]
 
//Per-pixel level
 
max r3, r1, r2
max r3.x, r3.x, r3.y
max r3.x, r3.x, r3.z
rcp r3.w, r3.x
mad cbmcoord, r[VP], r3.w, c[P]
 
rcp r3.x, r3.w
mad depth, r3.x, c[z].x, c[z].y

Because depth textures cannot be cube maps, we could use color textures, packing depth values into the color channels. There are many ways to do this and many implementation-dependent tricks, but their description is out of the scope of this chapter.

Another possibility is to emulate this cube map approach with multitexturing, in which every cube map face becomes an independent texture (depth textures are great in this case). We form several texture coordinate sets in the vertex shader and multiply by the shadow results in the pixel shader. The tricky part is to manage these textures over the objects in the scene, because every object rarely needs all six faces.

14.2.3 Biasing

As we stated earlier, the constant bias that is typically used in uniform shadow maps cannot be used with PSMs because the z values and the texel area distributions vary greatly with different light positions and points in the scene.

If you plan to use depth textures, try z slope–scaled bias for biasing. It's often enough to fix the artifacts, especially when very distant objects don't fall within the camera. However, some cards do not support depth textures (in DirectX, depth textures are supported only by NVIDIA cards), and depth textures can't be a cube map. In these cases, you need a different, more general algorithm for calculating bias. Another difficulty is that it's hard to emulate and tweak z slope–scaled bias because it requires additional data—such as the vertex coordinates of the current triangle—passed into the pixel shader, plus some calculations, which isn't robust at all.

Anyway, let's see why we can't use constant bias anymore. Consider these two cases: the light source is near the unit cube, and the light source is far from the unit cube. See Figure 14-13.

fig14-13.jpg

Figure 14-13 Light Close to and Far from the Unit Cube

The problem is that the Z f /Z n correlation, which determines the z-value distribution into a shadow map, varies a lot in these two cases. So the constant bias would mean a totally different actual bias in world and post-projective space: The constant bias tuned to the first light position won't be correct for the second light, and vice versa. Meanwhile, Z f /Z n changes a lot, because the light source could be close to the unit cube and could be distant (even at infinity), depending on the relative positions of the light and the camera in world space.

Even with a fixed light source position, sometimes we cannot find a suitable constant for the bias. The bias should depend on the point position—because the projective transform enlarges the near objects and shrinks the far ones—so the bias should be smaller near the camera and bigger for distant objects. Figure 14-14 shows the typical artifacts of using a constant bias in this situation.

fig14-14.jpg

Figure 14-14 Artifacts with Constant Bias

In short, the proposed solution is to use biasing in world space (and not to analyze the results of the double-projection matrix) and then transform this world-space bias in post-projective space. The computed value depends on the double projection, and it's correct for any light and camera position. These operations could be done easily in a vertex shader. Furthermore, this world-space bias value should be scaled by texel size in world space to deal with artifacts caused by the distribution of nonuniform texel areas.

Pbiased = (P orig + L(a + bLtexel ))M,

where P orig is the original point position, L is the light vector direction in world space, Ltexel is the texel size in world space, M is the final shadow map matrix, and a and b are bias coefficients.

The texel size in world space could be approximately computed with simple matrix calculations. First, transform the point into shadow map space, and then shift this point by the texel size without changing depth. Next, transform it back into world space and square the length of the difference between this point and the original one. This gives us Ltexel :

and S x and S y are shadow map resolutions.

Obviously, we can build a single matrix that performs all the transformations (except multiplying the coordinates, of course):

where M' includes transforming, shifting, transforming back, and subtracting.

This turns out to be a rather empirical solution, but it should still be tweaked for your particular needs. See Figure 14-15.

fig14-15.jpg

Figure 14-15 Bias Calculated in the Vertex Shader

The vertex shader code that performs these calculations might look like Listing 14-2.

Example 14-2. Calculating Bias in a Vertex Shader

def c0, a, b, 0 ,0
 
// Calculating Ltexel
 
dp4 r1.x, v0, c[LtexelMatrix_0]
dp4 r1.y, v0, c[LtexelMatrix_1]
dp4 r1.z, v0, c[LtexelMatrix_2]
dp4 r1.w, v0, c[LtexelMatrix_3]
 
// Transforming homogeneous coordinates
 
// (in fact, we often can skip this step)
 
rcp r1.w, r1.w
mul r1.xy, r1.w, r1.xy
 
// Now r1.x is an Ltexel
 
mad r1.x, r1.x, c0.x, c0.y
 
dp3 r1.x, r1, r1
 
// Move vertex in world space
 
mad r1, v0, c[Lightdir], r1.x
 
// Transform vertex into post-projective space
 
// (we need z and w only)
 
dp4 r[out].z, r1, c[M_2]
dp4 r[out].w, r1, c[M_3]

The r[out] register holds the result of the biasing: the depth value, and the corresponding w, that should be interpolated across the triangle. Note that this interpolation should be separate from the interpolation of texture coordinates (x, y, and the corresponding w), because these w coordinates are different. This biased value could be used when comparing with the shadow map value, or during the actual shadow map rendering (the shadow map holds biased values).

14.3 Tricks for Better Shadow Maps

The advantage of shadow mapping over shadow volumes is the potential to create a color gradient between "shadowed" and "nonshadowed" samples, thus simulating soft shadows. This shadow "softness" doesn't depend on distance from the occluder, light source size, and so on, but it still works in world space. Blurring stencil shadows, on the other hand, is more difficult, although Assarsson et al. (2003) make significant progress.

This section covers methods of filtering and blurring shadow maps to create a fake shadow softness that has a constant range of blurring but still looks good.

14.3.1 Filtering

Most methods of shadow map filtering are based on the percentage-closer filtering (PCF) principle. The only difference among the methods is how the hardware lets us use it. NVIDIA depth textures perform PCF after comparison with the depth value; on other hardware, we have to take several samples from the nearest texels and average their results (for true PCF). In general, the depth texture filtering is more efficient than the manual PCF technique with four samples. (PCF needs about eight samples to produce comparable quality.) In addition, using depth texture filtering doesn't forbid PCF, so we can take several filtered samples to further increase shadow quality.

Using PCF with PSMs is no different from using it with standard shadow maps: samples from neighboring texels are used for filtering. On the GPU, this is achieved by shifting texture coordinates one texel in each direction. For a more detailed discussion of PCF, see Chapter 11, "Shadow Map Antialiasing."

The shader pseudocode for PCF with four samples looks like Listings 14-3 and 14-4.

These tricks improve shadow quality, but they do not hide serious aliasing problems. For example, if many screen pixels map to one shadow map texel, large stair-stepping artifacts will be visible, even if they are somewhat blurred. Figure 14-16 shows an aliased shadow without any filtering, and Figure 14-17 shows how PCF helps improve shadow quality but cannot completely remove aliasing artifacts.

fig14-16.jpg

Figure 14-16 Strong Aliasing

fig14-17.jpg

Figure 14-17 Filtered Aliasing

Example 14-3. Vertex Shader Pseudocode for PCF

def c0, sample1x, sample1Y, 0, 0
def c1, sample2x, sample2Y, 0, 0
def c2, sample3x, sample3Y, 0, 0
def c3, sample4x, sample4Y, 0, 0
// The simplest case:
 
// def c0, 1 / shadowmapsizeX, 1 / shadowmapsizeY, 0, 0
 
// def c1, -1 / shadowmapsizeX, -1 / shadowmapsizeY, 0, 0
 
// def c2, -1 / shadowmapsizeX, 1/ shadowmapsizeY, 0, 0
 
// def c3, 1 / shadowmapsizeX, -1 / shadowmapsizeY, 0, 0
 
. . .
 
// Point - vertex position in light space
 
mad oT0, point.w, c0, point
mad oT1, point.w, c1, point
mad oT2, point.w, c2, point
mad oT3, point.w, c3, point

Example 14-4. Pixel Shader Pseudocode for PCF

def c0, 0.25, 0.25, 0.25, 0.25
tex t0
tex t1
tex t2
tex t3
 
. . .
 
// After depth comparison
 
mul r0, t0, c0
mad r0, t1, c0, r0
mad r0, t2, c0, r0
mad r0, t3, c0, r0

14.3.2 Blurring

As we know from projective shadows, the best blurring results often come from rendering to a smaller resolution texture with a pixel shader blur, then feeding this resulting texture back through the blur pixel shader several times (known as ping-pong rendering). Shadow mapping and projective shadows are similar techniques, so why can't we use this method? The answer: because the shadow map isn't a black-and-white picture; it's a collection of depth values, and "blurring a depth map" doesn't make sense.

In fact, the proposal is to use the color part of the shadow map render (which comes almost for free) as projective texture for some objects. For example, assume that we have an outdoor landscape scene and we want a high-quality blurred shadow on the ground because ground shadows are the most noticeable.

  1. Before rendering the depth shadow map, clear the color buffer with 1. During the render, draw 0 into the color buffer for every object except the landscape; for the landscape, draw 1 in color. After the whole shadow map renders, we'll have 1 where the landscape is nonshadowed and 0 where it's shadowed. See Figure 14-18. fig14-18.jpg

    Figure 14-18 The Original Color Part for a Small Test Scene

  2. Blur the picture (the one in Figure 14-18) severely, using multiple passes with a simple blur pixel shader. For example, using a simple two-pass Gaussian blur gives good results. (You might want to adjust the blurring radius for distant objects.) After this step, we'll have a high-quality blurred texture, as shown in Figure 14-19. fig14-19.jpg

    Figure 14-19 The Blurred Color Part for a Small Test Scene

  3. While rendering the scene with shadows, render the landscape with the blurred texture instead of the shadow map, and render all other objects with the depth part of the shadow map. See Figure 14-20. fig14-20.jpg

    Figure 14-20 Applying Blurring to a Real Scene

The difference in quality is dramatic.

Of course, we can use this method not only with landscapes, but also with any object that does not need self-shadowing (such as floors, walls, ground planes, and so on). Fortunately, in these areas shadows are most noticeable and aliasing problems are most evident. Because we have several color channels, we can blur shadows on several objects at the same time:

This way we'll have nice blurred shadows on the ground, floor, walls, and so on while retaining all other shadows (blurred with PCF) on other objects (with proper self-shadowing).

14.4 Results

The screenshots in Figures 14-21, 14-22, 14-23, and 14-24 were captured on the NVIDIA GeForce4 Ti 4600 in 1600x1200 screen resolution, with 100,000 to 500,000 visible polygons. All objects receive and cast shadows with real-time frame rates (more than 30).

fig14-21.jpg

Figure 14-21

fig14-22.jpg

Figure 14-22

fig14-23.jpg

Figure 14-23

fig14-24.jpg

Figure 14-24

14.5 References

Assarsson, U., M. Dougherty, M. Mounier, and T. Akenine-Möller. 2003. "An Optimized Soft Shadow Volume Algorithm with Real-Time Performance." In Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003.

Gartner, Bernd. 1999. "Smallest Enclosing Balls: Fast and Robust in C++." Web page. http://www.inf.ethz.ch/personal/gaertner/texts/own_work/esa99_final.pdf

Stamminger, Marc, and George Drettakis. 2002. "Perspective Shadow Maps." In Proceedings of ACM SIGGRAPH 2002.

The author would like to thank Peter Popov for his many helpful and productive discussions.

GPU Gems: Chapter 15. Managing Visibility for Per-Pixel Lighting

GPU Gems

GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.

Please visit our Recent Documents page to see all the latest whitepapers and conference presentations that can help you with your projects.



Chapter 15. Managing Visibility for Per-Pixel Lighting

John O'Rorke
Monolith Productions

15.1 Visibility in a GPU Book?

This chapter looks at the role that visibility plays in efficiently rendering per-pixel lit scenes. We also consider how to use that visibility to minimize the number of batches that must be rendered, so that we can improve performance.

At first glance, you may think that visibility has no place in a book about the advanced use of graphics hardware. Yet, regardless of how many tricks and optimizations we use on the GPU, the fastest polygon will always be the one that isn't rendered. That means if we reduce the number of rendered batches, we can add more objects to a scene, or use more complex geometry and techniques for other objects.

Countless papers and presentations tout reducing the number of batches to the graphics card to prevent the application from becoming CPU-bound. To clarify terminology, in this chapter a batch is any set of polygons that is sent to the card without being broken up by a state change. For example, a single DrawPrimitive call in Direct3D represents one batch. Reducing batches has been important in the past and is even more significant with the latest GPUs. GPUs are processing batches faster and faster, meaning that polygons can be processed more quickly within a single batch. However, the rate at which batches can be sent to the GPU is increasing very slowly. Compounding this problem is the current trend of using per-pixel lighting, which substantially increases the number of batches required to render a scene.

15.2 Batches and Per-Pixel Lighting

The reason for the significant rise in the number of batches when using per-pixel lighting comes from the manner in which this technique is implemented. The algorithm renders an ambient pass to apply a global ambient term and to establish the depth buffer for the frame. Then for each light, two values must be determined: the shadowing term and the lighting contribution.

The shadowing term is often computed using extruded stencil volumes or shadow maps, both of which require at least one batch to be rendered per object. Once the shadowing term is determined, each object the light touches must be rendered again to apply the lighting contribution, with some objects being masked out appropriately when in shadow.

Many developers using per-pixel lighting have struggled to get scenes to batch and render efficiently. In effect, the number of batches required for this new technique multiplies the previous problem by the number of lights in the scene—making efficient rendering even more difficult to manage.

15.2.1 A Per-Pixel Example

To demonstrate how many batches could be required to render a per-pixel lit scene, let's examine a very simple scenario. We have a room, divided into eight separate batches because of the different materials in different parts of the room. In addition, there are three different models in the room, which are separated into two batches to allow for matrix-palette skinning. Placed within this room are three lights.

So now let's determine how many batches we need. For the ambient pass, which establishes the depth values of the scene and applies any ambient or emissive lighting, all batches are rendered once. That's 14 batches to start. Then for each light, all batches are rendered once for the shadow and once again for the actual lighting pass, to accumulate the lights into the frame buffer. In this simple scene, we are already up to 98 batches. We know that around 10,000 to 40,000 batches per second can be sent to the graphics hardware, consuming the full CPU time of a 1 GHz processor (Wloka 2003). If that CPU speed is our minimum specification and only 10 percent can be allocated to batch submission, then only 1,000 to 4,000 batches per second are possible! Thus, in this simple scene, batch submission alone will restrict our frame rate to a range of 10 to 40 frames a second. Realistic scenes require many more batches than this example, so lots of effort must be spent to bring the number of batches into a reasonable range.

The intent of this chapter is not to examine various visibility algorithms or implement a visibility system. (An excellent discussion of visibility algorithms can be found in Akenine-Möller and Haines 2002.) Instead, this chapter illustrates how to leverage existing visibility algorithms to suit the unique needs of per-pixel lighting, with the goal of minimizing the number of batches sent to the hardware.

15.2.2 Just How Many Batches, Anyway?

The following pseudocode illustrates the number of batches that must be rendered in a scene.

For each visible object
  For each pass in the ambient shader
    For each visible batch in the object
      Render batch
For each visible light
  For each visible shadow caster
    For each pass in the shadow shader
      For each shadow batch in the object
        Render batch
  For each lit visible object
    For each pass in the light shader
      For each visible batch in the object
        Render batch

As the pseudocode shows, some non-visibility-related optimizations can be performed to reduce the number of batches. The largest optimization is the number of passes required to render the batches for each lighting situation. The number of batches increases linearly with the number of passes, so we should minimize passes in CPU-bound games.

The pseudocode also shows different batches being used for the shadow rendering. Although extruded shadow volumes almost always use separate batches, shadow map implementations do so less frequently. Having different batches for normal and shadow rendering is beneficial because certain batch boundaries can often be removed when performing shadow rendering. For example, picture a mesh that has two different materials used on two distinct parts. The materials are visible only when we perform lighting operations, and they are irrelevant for the shadow operation. Therefore, during the preprocessing of a model, two collections of batches may result: one for use when rendering lighting and another for use when rendering shadows. Both the pass-reduction and the batch-reduction techniques are critical to reducing batch levels, but they are not enough by themselves. By using visibility testing, we can prune a significant number of the batches in a scene and achieve much greater performance.

15.3 Visibility As Sets

To understand how to use visibility to reduce the number of batches, we'll take a high-level look at the uses of visibility and define the operations in terms of set logic. Then we'll describe how to compute these sets. Visibility is considered not only for the viewer of a scene, but also for each light of a scene, because if a light cannot see an object, that object does not need to be rendered in the light's lighting pass.

15.3.1 The Visible Set

The first set to define is the visible set, which consists of all objects that are visible from the point of view of the camera. Nearly every rendering application can determine the visible set, which we refer to as V.

15.3.2 The Lights Set

In addition to finding out which objects are visible in the scene, we need to determine the set of lights in the visible set.

For each visible light, another visibility set must be created, this time from the point of view of the light. Let L denote the set of objects that are visible from the light. Most per-pixel lighting solutions apply one light at a time, simply accumulating the results in the frame buffer. As a result, there are often no dependencies between lights, and so only one light visible set needs to exist at a time for the current light being rendered. This technique avoids having to store all light sets in memory at once; however, this concept extends to rendering multiple lights in a single pass by simply accumulating the results into a single lighting set.

Now that we have defined visibility sets for the viewer and the lights, we can establish several sets that will reduce the number of objects drawn in rendering.

15.3.3 The Illumination Set

The first rule of determining the illumination set is that a lighting pass needs to occur only on the set of objects that exist in both sets V and L, or using set notation: V U2229.GIF L. This is because only those objects that the light can see need to be rendered again to provide the light contribution, and only those objects in the visible set are seen on the screen. This set of objects that will be rendered again in order to be illuminated is denoted as set I. If I is empty, we can skip rendering the light. This rule is typically fast and simple to apply, and it works well for quickly optimizing lights that are on the edge of the view frustum, have a very large radius, or are occluded within the frustum.

15.3.4 The Shadow Set

Now that the set of objects for the lighting pass has been reduced to a reasonable level, let's look at the set of objects for the shadow pass. The shadow set is more difficult to determine, and some balancing must be done between overall culling cost and the number of objects rendered.

A common initial mistake is to use set I for the shadow pass. However, as shown in Figure 15-1, sometimes an object outside the frustum can affect the final rendered image by projecting a shadow into the frustum. So we must generate a different set, called S, that is a subset of set L and includes all objects that cast a shadow into the visible region.

fig15-01.jpg

Figure 15-1 Objects Not in the Visible Set Can Influence the Rendered Scene

At this point, we have defined all the sets we need for rendering with per-pixel lighting. First set V is generated from the camera and rendered for the ambient pass. Next is the rendering of each light. For each light, set L is determined and from this, sets I and S are generated. Set S determines the shadowing term for the light, and then each object in set I is rendered to apply the lighting.

Now we discuss the details of efficiently generating each set.

15.4 Generating Sets

Theory is great, but small details can make the difference between high stress and high frame rates. So in this section, we cover the fine points of generating each set introduced in the preceding section, with practical application in mind.

15.4.1 The Visible Set

The lights and the viewport each need to generate a visible set. But how tight should the visible set be? And how much processor time should be spent determining these sets? The answers depend on the type of application being developed. However, at the very minimum, the visibility determination algorithm should perform frustum-level culling and a fair amount of object-level occlusion. The reason for this requirement is simply a matter of scale. If a standard visible set contains ten objects in the frustum and 30 percent of the objects are occluded, three objects can be dropped.

Let's factor in the lights as well, because we will be using visibility for the lights in the scene. If each light has similar statistics for occlusion, then in realistic scenes we can avoid rendering dozens or even hundreds of objects per frame. So the level of occlusion within a frustum should be carefully considered when determining the visibility system for a per-pixel lighting renderer. From there, it is simply a balancing act to determine the best ratio of CPU time to occlusion.

15.4.2 The Lights Set

Determining this set is nearly identical to determining the visible set. However, point lights can cause problems for visibility algorithms that perform any sort of projection onto a plane. A point light has a full spherical field of view, so it cannot be mapped easily onto a plane, something that many visibility algorithms rely on. One solution to this problem is to place a cube around a light and then perform the visibility test once for each face of the cube, from the point of view of the light. This method can become very expensive, though, because it requires doing the visibility determination once for each of the six faces; care must be taken not to add the same object to the visible set multiple times if it is seen by multiple faces. As a general rule, examine the visibility system, and if it uses any form of projection onto a plane, consider switching to a visibility system that works without any projections, or implementing a separate visibility system exclusively for the point-light visibility queries.

15.4.3 The Illumination Set

Fast set operations are critical for efficiently determining the illumination set. There are two approaches to implementing the necessary set operations: (1) have sets that know which objects they contain and (2) have objects that know which sets they belong to.

The first approach uses sorted lists that contain references to all the objects within that set. The lists must be sorted to allow for determining intersections and unions in linear time through merging, but the sorting can be based on whatever criteria are appropriate. This implementation can be difficult to perform efficiently because it involves sorting, merging, and occasionally searching, but for some applications, it works well if these sets are already needed for other operations. Using this approach, determining the illumination set would simply be a matter of determining sets V and L and then finding the elements contained within both sets.

The second approach works particularly well for per-pixel lighting, where a fixed number of sets that the object belongs to can be stored as flags on the object. Then instead of building a set, we perform operations on the list of objects. To build the visible set, we find all the objects in view and flag them as belonging to the visible set. To determine the set L, we perform a similar process, but we flag the objects as being in the lighting group. Then set I is found simply by looking through the list of objects that the light can see and finding the objects that are also flagged as being in the visible set. The implementation of this second approach is much simpler than that of the first, but the flags do need to be reset after the operation is complete.

15.4.4 The Shadow Set

Shadow calculation is more difficult than the other operations because it involves operations on volumes extruded by a light. Often objects are represented with a bounding box or a bounding sphere that encompasses the visible geometry. The volume of space that the shadow of an object affects is determined by extruding each point of the primitive along the vector to the light, to the point where the distance to the light is equal to the radius of the light. Therefore, the shadow set must include all objects that, when extruded from the light up to the effective light radius, intersect the view frustum.

Shadows for Lights Inside a Frustum

Lights always project shadows away from the position of the light. Therefore, if a light is located within the view frustum, it is safe to discard from the shadow set any object that is outside the view frustum. This is because if a light is inside a convex volume and an object is outside the volume, the object will always be extruded away from the convex volume and therefore cannot intersect the frustum.

We need to use the set of all objects in the view frustum to determine which objects cast shadows, not the visible set V, which is only a subset of the objects in the view frustum. The set V may have occluded objects that are not directly visible but can still cast shadows into visible regions, as illustrated in Figure 15-2. Therefore, a new set, called set F, is defined for all objects in the view frustum without any occlusion. If the light is in the frustum, then the shadow set S is defined as the intersection of sets F and L.

fig15-02.jpg

Figure 15-2 An Object Not in the Visible Set but in the Frustum, Casting Shadows

Shadows for Lights Outside a Frustum

Unfortunately, lights outside a frustum are not simple, because the light can project objects that are outside the view frustum into the view frustum. However, this occurs only within a certain region of space defined as the convex hull around the view frustum and the point of the light source (Everitt and Kilgard 2003). Once this convex hull is determined, detecting whether an object needs to cast a shadow is simply a matter of seeing if it overlaps this convex space.

The trick to this approach is to quickly determine the convex hull around the view frustum and the light position. This is a constrained case of adding a point to an already convex hull, so a simple solution can be used. The frustum begins as six planes defining a convex region of space, with the planes defining an inside half-space that includes the frustum and an outside half-space. Our goal is to create a volume that encloses the area that would be covered if lines were connected from the light source to every point contained within the view frustum volume. Figure 15-3 illustrates the process.

fig15-03.jpg

Figure 15-3 Creating the Convex Hull Around the Light and View Frustum

The final convex hull will consist of any planes from the original view frustum where the light is contained within the inside half-space of the plane. In addition, it will consist of a plane for each silhouette edge that passes through the edge and the light. These silhouette edges are simply edges where the light is on the inside half-space of one of the touching planes, and in the outside half-space of the other plane. Both can be found simply and efficiently, but the view frustum will need some extra data in addition to just these planes. This data is a listing of:

For each plane of the view frustum, the position of the light is checked. If the light position is on the same side of the plane as the view frustum, that plane can be used directly in the convex hull. However, if the light position is on the opposite side of the view frustum, the plane must be discarded. At this point, the edge count of each edge that the plane affects must be incremented. Once this is done for all planes of the view frustum, there will be a listing of planes to use and a count for each edge in the view frustum.

The edge count serves the role of detecting the silhouette edge. If the count is zero, the light position was inside both defining planes, and there is no silhouette cast from that edge. If it is more than one, the light was outside both defining planes, and again there is no contributing silhouette edge. If an edge in the view frustum has a count of one, this indicates a silhouette edge from the perspective of the light, meaning that a plane must be generated that includes the light position and both points of the edge. The generation of this plane is where the winding order of the edges comes in. Whatever plane the light is outside of should store the winding order of the edge with that edge. The winding order should flag whether the edge is winding clockwise or counterclockwise with respect to the facing of the plane, and it is used to ensure that the resulting plane will be facing the correct direction. This is needed because two planes define the edge, and it is impossible to lay out all the edges so that the winding order is consistent with respect to all planes defining it.

The silhouette edge portion of the algorithm results in a frustum that extrudes from the point of view of the light and outlines the frustum with respect to the light. This volume would stretch on forever, but it is bound by the planes that contain the light in the inside half-space, effectively forming a cap and resulting in a convex hull around the frustum and the light.

This pseudocode can quickly generate the convex hull:

Initialize all edge counts to zero
 
For each plane
  If the light is on the inside
    Add the plane to the final plane list
  Else
    For each edge the plane partially defines
      Increment the edge count
      Store the winding for this edge
 
For each edge
  If the edge count is equal to one
    Add a plane that includes the edge and the light point,
      flipping the plane normal if the winding is reversed

15.5 Visibility for Fill Rate

Visibility can be used effectively to improve performance not only on the CPU, but also on the GPU. When we perform per-pixel lighting with stencil volumes, the fill rate consumed—by filling in stencil volumes or by rendering large objects multiple times—can quickly become a bottleneck. The best way to combat this bottleneck is to restrict the area that the card can render to by using a scissor rectangle. On the most recent NVIDIA cards, the scissor rectangle concept can be extended even further, to a depth test that acts like a z-scissor range and effectively emulates a scissor frustum in space.

To create a scissor rectangle, project the dimensions of the light onto the screen and restrict rendering to that region. However, on even medium-size lights, this method quickly loses most benefits because the light covers the majority of the screen. It becomes particularly inefficient in tight areas with large lights. In this situation, the lights do not affect a large portion of the screen; however, because of their radii, a naive implementation has a much larger scissor rectangle than necessary. Picture a ventilation shaft with a light that shines down most of the shaft. To illuminate this large area, the light must have a large radius, which leads to a large scissor rectangle—even though it is lighting only the shaft, which might be a small portion of the screen.

To ensure that the scissor rectangle is as tight as possible, we can use the sets that were outlined earlier. We already have the listing of objects that the light will influence when it is rendered. And we can determine the bounding rectangle of a primitive projected onto the screen. So we can find a very tight scissor rectangle by this method: Project each object that is affected by the light onto the screen, find the total bounding box of those objects, and perform an intersection with the projected bounding box for the light itself. It's best to use low-level primitives (such as bounding spheres or axis-aligned bounding boxes) for these projections because they can be costly and there's little benefit in increasing the tightness of the bounding primitives.

By performing these operations—even if a light has a huge radius—we create a scissor rectangle that is never larger than the area of the objects the light affects projected on the screen. This approach can dramatically reduce fill and greatly help keep frame rates consistent across scenes.

15.6 Practical Application

All the techniques mentioned in this chapter were implemented in an existing system and yielded significant performance improvements. The visibility solution was a portal visibility scheme, which allowed the use of the same visibility system for the viewer and the lights.

Table 15-1 shows the number of frames per second in a normalized form, as well as the number of batches required from various scenes, with different components enabled. For each scene, the frame rate is normalized and is shown when (a) no visibility is used; (b) visibility only for the camera is used; (c) visibility for lights and camera is used; (d) visibility for lights, camera, and shadow is used; and (e) scissor rectangles are added.

Table 15-1. The Effects of Different Visibility Techniques on Performance

Scene 1

Normalized Frame Rate

Batches

Frustum-based visibility

0.87

1171

Visibility occlusion

0.92

492

Lighting occlusion

0.99

468

Shadow occlusion

0.99

460

Scissor rectangle

1.00

460

Scene 2

Normalized Frame Rate

Batches

Frustum-based visibility

0.56

1414

Visibility occlusion

0.80

521

Lighting occlusion

0.98

438

Shadow occlusion

0.98

437

Scissor rectangle

1.00

437

As Table 15-1 affirms, the introduction of visibility into a scene can dramatically improve frame rate. Further testing of the scene showed that after a certain point, the application became fill-rate-limited, and performance improvements from the visibility were due to occlusion that resulted in fewer objects for the video card to rasterize.

15.7 Conclusion

The high number of batches and large amount of fill rate that per-pixel lighting requires means that we need to minimize the number of rendered objects and the area of the screen they affect. By using any standard visibility algorithm and the techniques illustrated in this chapter, we can substantially improve performance.

15.8 References

Akenine-Möller, Tomas, and Eric Haines. 2002. Real-Time Rendering, 2nd ed. A. K. Peters. See the discussion of visibility algorithms on pp. 345–389.

Everitt, Cass, and Mark J. Kilgard. 2003. "Optimized Stencil Shadow Volumes." Presentation at Game Developers Conference 2003. Available online at http://developer.nvidia.com/docs/IO/8230/GDC2003_ShadowVolumes.pdf

Wloka, Matthias. 2003. "Batch, Batch, Batch: What Does It Really Mean?" Presentation at Game Developers Conference 2003. Available online at http://developer.nvidia.com/docs/IO/8230/BatchBatchBatch.pdf