GPU Gems is now available, right here, online. You can purchase a beautifully printed version of this book, and others in the series, at a 30% discount courtesy of InformIT and Addison-Wesley.
The CD content, including demos and content, is available on the web and for download.
Matt Pharr
NVIDIA
This chapter describes a technique for computing an approximation to the derivative of arbitrary quantities in fragment programs. This technique loads special values into each level of a texture mipmap and uses the texture-mapping hardware in an unusual way to cause it to return the results of a derivative computation, giving an indication of how quickly the quantity is changing from pixel to pixel on the screen. Computing derivatives such as these is a critical operation to perform in shaders that compute procedural patterns, because it allows them to perform antialiasing on the patterns they generate by prefiltering them to remove detail that can't be represented by the pixel-sampling rate.
Cg's Standard Library provides ddx() and ddy() functions that compute the derivative of an arbitrary quantity with respect to the x and y pixels. In other words, given some arbitrary variable v, calling ddx(v) tells you approximately how much v will change between the current pixel and the next pixel in the x direction; the same is true for ddy(v) for the y direction. Unfortunately, some hardware profiles (for example, arbfp1) do not support these derivative computations, which makes it very difficult to write shaders that don't suffer from aliasing artifacts when using those profiles. This chapter gives a solution to this deficiency based on the texture-mapping functionality that those profiles do provide.
Consider a simple procedural checkerboard function that returns a value of 0 or 1 based on the (u, v) texture coordinates at a point:
float checker(float2 uv)
{
return (fmod(floor(uv.x) + floor(uv.y), 2) < 1) ? 0 : 1;
}
This function computes the integer components of the coordinates of the cell that the (u, v) point is in and alternates between returning 0 and 1 at adjacent cells. The result is the checkerboard pattern shown in Figure 25-1b. With programmable graphics hardware, it's no longer necessary to encode patterns like this in a texture map; the texture function can potentially be evaluated in a fragment program instead. (For a simple checkerboard, there's no real reason to do this procedurally, but in general, procedural texturing opens up a new level of flexibility in real-time rendering. In any case, the same antialiasing principles we will apply to the checkerboard apply for more complex procedural textures.)
Figure 25-1 Procedural Checkerboard Texture, Without and With Antialiasing
One drawback of procedural textures is that the author of the shader needs to be careful to antialias the shader. Roughly speaking, this means that instead of computing the value of the procedural function at a single point, it is necessary to compute the average value of the function over an area. (Chapter 24, "High-Quality Filtering," discusses this topic in more depth.) Specifically, one needs to compute the average value of the function over the area between the current pixel and the adjacent pixels. If the fragment shader doesn't take care of this work, the image will be aliased, as is seen in Figure 25-1a. There, not only does the checkerboard break up into ugly patterns at the horizon, but the edges of the closer checkers have staircase artifacts as well. These errors look even worse when the viewer or the object is moving. (This aliasing is similar to the difference between using a point-sampling texture filtering mode when a texture map is minified instead of using trilinear filtering.)
To eliminate these errors, we need to compute the average color of the checkerboard over a small area on the screen around each pixel. The basic filterwidth() function shown below makes it easy to compute how quickly any value is changing from pixel to pixel, and thus the area over which the procedural texture needs to filter. When called with the uv coordinates in the checker() function shown earlier, the value it returns gives a single approximation of how much they will change between the current pixel and its neighbors. (Thus, it returns an isotropic filter width, as opposed to a more accurate anisotropic width, which may have a different extent along different directions.)
float filterwidth(float2 v)
{
float2 fw = max(abs(ddx(v)), abs(ddy(v)));
return max(fw.x, fw.y);
}
In particular, we can assume that if uv is the value of the texture coordinates at one pixel and uv1 is the value at a neighbor, then their change is bounded by the filter width:
abs(uv.x - uv1.x) < filterwidth(uv) abs(uv.y - uv1.y) < filterwidth(uv)
This should be true regardless of the geometry of the object being rendered or how uv was computed—it need not just be a value passed in from a vertex program, for instance. The filterwidth() function will generally overestimate the change in the value passed to it (leading to filtering over too large an area and an overblurred result), and it will try not to underestimate it (leading to filtering over too small an area and giving an aliased result). Graphics APIs such as OpenGL and Direct3D make this same trade-off when specifying texture-filtering behavior.
To compute the antialiased checkerboard, instead we would like to compute the average color over the 2D range of (u, v) coordinates from uv - .5 * filterwidth(uv) to uv + .5 * filterwidth(uv). The code in Listing 25-1 implements this computation, which is effectively the integral of the function over the filter region divided by the size of the filter region. (The "Further Reading" section at the end of this chapter has pointers to additional information about how this method works.) An image rendered using this version of the checker() function is shown in Figure 25-1b. Note that both of the aliasing errors in the original checkerboard no longer appear in this image. The edges of the checks in the middle of the image are slightly overblurred, but this is a much less visually objectionable error.
float checker(float2 uv)
{
float width = filterwidth(uv);
float2 p0 = uv - .5 * width, p1 = uv + .5 * width;
#define BUMPINT(x)
(floor((x) / 2) + 2.f * max(((x) / 2) - floor((x) / 2) - .5f, 0.f)) float2 i =
(BUMPINT(p1) - BUMPINT(p0)) / width;
return i.x * i.y + (1 - i.x) * (1 - i.y);
}
Unfortunately, our filterwidth() function shown earlier will not compile on profiles that don't support the ddx() and ddy() functions. However, we can trick the texture-mapping hardware into computing essentially the same value that filterwidth() does. The key to this trick is the fact that texture-map lookups done with functions such as tex2D() automatically antialias the texture lookup, regardless of the texture coordinates that are passed in to them. For example, if you call tex2D(map, float2(sin(u), tan(v))), even though the actual texture coordinates are changing in unusual ways from pixel to pixel, a properly filtered texture value will still be returned at each pixel. The hardware can determine the appropriate area of the texture to filter over for each individual lookup, based on the texture coordinates that are computed at adjacent pixels.
This feature of texture mapping can be used to compute the filterwidth() function without calling ddx() and ddy(). We will perform a texture lookup in a way that lets us deduce the area of the texture map that was filtered; this tells us the area of the procedural checkerboard texture we need to filter, as well. There are two components to our solution: (1) determining which mipmap level was chosen for a particular set of texture coordinates and then (2) using that knowledge to determine the filter width.
For the first part of the problem, consider a square texture with eight levels in its mipmap. There are 128x128 texels at the base level, 0, of the mipmap, 64x64 at the next level, and so on, up to the top level, 7, where there is a single texel. If we explicitly assign all values in each level of the mipmap so that level 0 has the value 0 in all of its texels, level 1 has the value 1, and so forth, then whenever we do a texture lookup with this texture, the value returned will tell us which mipmap level was actually chosen by the texture-mapping hardware for that particular lookup. In general, the texture-mapping hardware blends between two levels of the mipmap, so a fractional blend between values at two adjacent levels will be returned.
We will use this basic idea, but instead, let's encode the base-2 logarithm of the width of the texture map in texels at each level. Thus, the top level of the map has a value of 0 in its single texel, the level below it has a value of 1, the value below that has a value of 2, and so on. We will also scale this value up by 16, in order to cover more of the precision available in 8-bit-per-channel texture maps. Listing 25-2 shows the C++ source code for OpenGL that loads up a texture map in this manner.
Thus, when we do a texture lookup with this texture map in a Cg program, we can find out the base-2 logarithm of the width of the mipmap level used for texture filtering.
The key observation behind this technique is that mipmapped texture filtering will choose a mipmap level based on how quickly the given texture coordinates are changing from pixel to pixel, calculated so that the texels at the chosen mipmap level have about the same distance between them as the texture coordinates from adjacent pixels. In other words, if we can determine the texel spacing for the selected mipmap level, then we know approximately how much the texture coordinates are changing from pixel to pixel. Because we have stored the base-2 logarithm of the number of texels at each level of the mipmap, then the spacing between texels is just 1/exp2(logNumTexels) == exp2(-logNumTexels). This is our desired filter width.
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
#define LEVELS
10 #define RES(1 << (LEVELS - 1)) GLubyte *data = new GLubyte[RES * RES];
for (int level = 0; level < LEVELS; ++level)
{
int res = (1 << (LEVELS - 1 - level));
int log2Width = LEVELS - 1 - level;
for (int i = 0; i < res * res; ++i)
data[i] = 16 * log2Width;
glTexImage2D(GL_TEXTURE_2D, level, 1, res, res, 0, GL_RED, GL_UNSIGNED_BYTE,
data);
}
delete[] data;
Thus, given a texture map initialized like the previous one bound to a sampler2D called filterMap, the Cg code in Listing 25-3 first finds the log-2 width of the mipmap level that would be used for filtering a texture lookup with the given texture coordinates. Note that the value returned by tex2D() will be between 0 and 1; the data values passed to glTexImage2D() above are effectively divided by 255 to put them in this range. Thus, we need to correct for both this scaling and the extra scaling by a factor of 16 that we did when filling in the mipmap levels to get the actual log-2 width in the Cg program.
uniform sampler2D filterMap;
float filterwidth(float2 uv)
{
float log2width = (255. / 16.) * tex2D(filterMap, uv).x;
return exp2(-log2width);
}
Using this filter width function instead of the original gives a visually indistinguishable result. It compiles down to three fragment shader instructions, though in the context of a complete shader, we have not seen a measurable performance difference between the two.
This approach can work well for computing filter width values in many situations. Its performance is nearly the same as the derivative-based approach for computing filterwidth(), though at a cost of using a texture unit and a megabyte or so of texture memory. Because the texture levels hold constant values, the textures used are potentially excellent candidates for compressed texture formats, which would reduce memory use and the GPU bandwidth needed at runtime.
The main drawback of this approach is that unlike the standard technique, this method may give incorrect results when the filter width value should be very large or very small. Consider a rapidly changing value, such that the filter width should have a value of 2. This technique will underestimate the filter width, because the top level of the mipmap would be chosen, and we are unable to differentiate from a quantity with a filter width of 1. Similarly, we may overestimate the filter width for a very slowly changing value: the most detailed level of the mipmap limits the smallest filter width it can compute, as well. If a general bound of the range of expected filter width values is known, however, this problem may not be very troublesome in practice. Alternatively, if we know that the filtered version of the procedural texture takes on a single average value for all filter widths beyond some point, we don't need to worry about that case.
We can work around these problems by detecting when we have a result from the top or bottom level of the mipmap (corresponding to texture mipmap level values of 0 and the number of mipmap levels minus 1, respectively). In that case, we can try another texture lookup with scaled texture coordinates, compute the filter width for those coordinates, and then rescale the filter width to compensate for the scaling of texture coordinates. The code in Listing 25-4 shows how this can be done for the case of underestimated filter widths resulting from going off the top of the mipmap pyramid. The analogous approach can be used to solve the problem of going off the bottom of the pyramid, as well.
float filterwidth(float2 uv)
{
float log2width = (255. / 16.) * tex2D(filterMap, uv).x;
if (log2width < .01)
{
log2width = (255. / 16.) * tex2D(filterMap, uv / 512).x;
return 512 * exp2(-log2width);
}
else
return exp2(-log2width);
}
The scaling factor 512 used here was based on the fact that we loaded a ten-level mipmap in Listing 25-2. If more or fewer levels are available, we would need to adjust the scaling factor correspondingly.
Ebert, David S., F. Kenton Musgrave, Darwyn Peachey, Ken Perlin, and Steven Worley. 2003. Texturing and Modeling: A Procedural Approach. Morgan Kaufmann. Chapter 2 of this book has an excellent introduction to issues related to antialiasing in procedural shaders.
Apodaca, Anthony A., and Larry Gritz, eds. 1999. Advanced RenderMan: Creating CGI for Motion Pictures. Morgan Kaufmann. This is another excellent resource for these issues.
Williams, Lance. 1983. "Pyramidal Parametrics." In Computer Graphics (Proceedings of SIGGRAPH 83) 17(3), pp. 1–11. Williams introduced mipmaps to the graphics world in this article.
Fernando, Randima, Sebastian Fernandez, Kavita Bala, and Donald P. Greenberg. 2001. "Adaptive Shadow Maps." In Proceedings of ACM SIGGRAPH 2001, pp. 387–390. This paper contains the first similar abuse of mipmapping that we are aware of; the authors used a comparable technique to determine the amount of magnification in shadow maps.
Hadwiger, Markus, Helwig Hauser, Thomas Theußl, and Meister Eduard Gröller. 2003. "MIP-Mapping with Procedural and Texture-Based Magnification." Sketch presented at SIGGRAPH 2003. More recently, the authors of this sketch used a similar technique to detect texture magnification in a fragment program in order to manually apply a bicubic filter, resulting in better images with magnified textures.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.
The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.
The publisher offers discounts on this book when ordered in quantity for bulk purchases and special sales. For more information, please contact:
U.S. Corporate and Government Sales
(800) 382-3419
corpsales@pearsontechgroup.com
For sales outside of the U.S., please contact:
International Sales
international@pearsoned.com
Visit Addison-Wesley on the Web: www.awprofessional.com
Library of Congress Control Number: 2004100582
GeForce™ and NVIDIA Quadro® are trademarks or registered trademarks of NVIDIA Corporation.
RenderMan® is a registered trademark of Pixar Animation Studios.
"Shadow Map Antialiasing" © 2003 NVIDIA Corporation and Pixar Animation Studios.
"Cinematic Lighting" © 2003 Pixar Animation Studios.
Dawn images © 2002 NVIDIA Corporation. Vulcan images © 2003 NVIDIA Corporation.
Copyright © 2004 by NVIDIA Corporation.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. Printed in the United States of America. Published simultaneously in Canada.
For information on obtaining permission for use of material from this work, please submit a written request to:
Pearson Education, Inc.
Rights and Contracts Department
One Lake Street
Upper Saddle River, NJ 07458
Text printed on recycled and acid-free paper.
5 6 7 8 9 10 QWT 09 08 07
5th Printing September 2007