# GPU Gems 3

**GPU Gems 3**is now available for free online!

The CD content, including demos and content, is available on the web and for download.

You can also subscribe to our Developer News Feed to get notifications of new material on the site.

# Part VI: GPU Computing

In

GPU Gems 3, we continue to showcase work that uses graphics hardware for nongraphics computation. As each new generation provides significantly greater computing power and programmability, GPUs are increasingly attractive targets for general-purpose computation, or what is commonly calledGPGPUorGPU Computing. As a result, researchers and developers in academia and industry continue to develop new GPU algorithms for tasks such as sorting, database operations, image processing, and linear algebra. In many cases, the principal motivation for using the GPU is the prospect of high performance at a relatively low cost.GPU programming tools have evolved dramatically over the past few years. Recently, NVIDIA launched a new set of tools for GPU Computing with the introduction of its CUDA technology. CUDA provides a flexible programming model and C-like language for implementing data-parallel algorithms on the GPU. What's more, NVIDIA's CUDA-compatible GPUs have additional hardware features specifically designed to boost performance and give users more control over how algorithms are mapped to the GPU. In many ways, CUDA is an important step forward in widening the domain of algorithms that can benefit from GPU performance. This part of the book contains a mix of new applications using CUDA, in addition to graphics-based GPGPU using languages like Cg.

We begin this section with a look at the role of GPUs in network security. For network virus detection systems, there is a tradeoff between fast, expensive solutions using specialized processors and low-cost alternatives based on commodity CPUs. In

Chapter 35, "Fast Virus Signature Matching on the GPU," Elizabeth Seamansof Juniper Networks andThomas Alexanderof Polytime present a high-performance, GPU-based virus scanning library. The system uses the GPU as a fast filter to quickly identify possible virus signatures for thousands of data objects in parallel. The performance of their library suggests that the GPU is now a viable platform for cost-effective, high-performance network security processing.In

Chapter 36, "AES Encryption and Decryption on the GPU," Takeshi Yamanouchiof SEGA Corporation describes his work on implementing encryption algorithms on the GPU. AES (Advanced Encryption Standard) is the current standard for block cipher encryption, and, like many encryption algorithms, it relies heavily on integer operations. The author describes how to use the integer-processing capabilities of NVIDIA's GeForce 8800 GPUs to accelerate AES encryption and decryption.Many software systems, including particle physics simulators and stochastic ray tracers, rely on Monte Carlo methods to efficiently solve problems involving complex, multidimensional functions. Fast and accurate random number generation is a critical component of all Monte Carlo simulations. In

Chapter 37, "Efficient Random Number Generation and Application Using CUDA," Lee HowesandDavid Thomasof Imperial College London present methods for generating random numbers using CUDA to exploit the massive parallelism and arithmetic performance of the GPU. They describe the relative advantages of two fast algorithms for generating Gaussian random numbers—techniques that are particularly useful in financial simulations for pricing stock options.Companies in the oil and gas industry depend on accurate seismic surveys of the Earth to identify subsurface oil reservoirs. The challenge is that most seismic data sets are many terabytes in size and it takes enormous amounts of computing power to convert the raw data into useful survey images. In

Chapter 38, "Imaging Earth's Subsurface Using CUDA," Bernard DeschizeauxandJean-Yves Blancof CGGVeritas describe a CUDA implementation of several time-critical algorithms within their industrial seismic processing pipeline. Their CUDA implementation achieves significant performance improvements over the latest generation of CPUs, and the authors discuss the possibility of building clusters of GPUs to accelerate large seismic processing problems.A number of commonly used algorithms in computer science involve a simple operation called

all-prefix-sum, orscan.For each value in an array of data, the scan operation computes the sum of all preceding values. InChapter 39,"Parallel Prefix Sum (Scan) with CUDA," Mark Harrisof NVIDIA andShubhabrata SenguptaandJohn D. Owensof University of California, Davis, describe an efficient CUDA implementation of a parallel scan algorithm and provide results for applications such as stream compaction and radix sort. This chapter is also a good reference for developers to learn CUDA programming and optimization strategies.The Gaussian function is one of the most widely used filter kernels in image and signal processing. The exponential term makes the Gaussian expensive to evaluate dynamically, so in practice it is common to precompute a table of coefficients. In

Chapter 40, "Incremental Computation of the Gaussian," Ken Turkowskiof Adobe Systems presents a method to quickly evaluate the Gaussian on the fly using a technique similar to polynomial forward differencing. By replacing differences with quotients, this algorithm incrementally computes Gaussian coefficients. For a GPU implementation, this approach eliminates a texture lookup in the pixel shader, which can result in faster filtering performance.

Chapter 41, "Using the Geometry Shader for Compact and Variable-Length GPU Feedback," completes this section by describing how to use a new hardware feature in DirectX 10-compliant GPUs to implement algorithms that cannot be implemented efficiently using pixel or vertex shaders. The geometry shader is an extra stage in the GPU rendering pipeline that is capable of executing algorithms with variable, data-dependent input and output. This capability is particularly useful for computer vision applications that analyze images to identify geometric shapes. In this chapter,Franck Diardof NVIDIA presents geometry shader implementations of several algorithms, including histogram building and corner detection.This section provides a small sampling of recent work on GPGPU techniques. Even with rapidly evolving architectures and programming tools like NVIDIA's CUDA, GPUs remain fairly specialized for data-parallel computation. However, it is clear that many important algorithms in scientific computing and other fields have enough parallelism to benefit from GPU performance, and it's likely that new algorithms will emerge as GPUs become more general and easier to program. As the chapters in this section demonstrate, the price/performance ratio of graphics processors is a potentially disruptive force in high-performance, and other, computing industries.

Nolan Goodnight, NVIDIA Corporation

- Contributors
- Foreword
- Part I: Geometry
- Chapter 1. Generating Complex Procedural Terrains Using the GPU
- Chapter 2. Animated Crowd Rendering
- Chapter 3. DirectX 10 Blend Shapes: Breaking the Limits
- Chapter 4. Next-Generation SpeedTree Rendering
- Chapter 5. Generic Adaptive Mesh Refinement
- Chapter 6. GPU-Generated Procedural Wind Animations for Trees
- Chapter 7. Point-Based Visualization of Metaballs on a GPU

- Part II: Light and Shadows
- Chapter 10. Parallel-Split Shadow Maps on Programmable GPUs
- Chapter 11. Efficient and Robust Shadow Volumes Using Hierarchical Occlusion Culling and Geometry Shaders
- Chapter 12. High-Quality Ambient Occlusion
- Chapter 13. Volumetric Light Scattering as a Post-Process
- Chapter 8. Summed-Area Variance Shadow Maps
- Chapter 9. Interactive Cinematic Relighting with Global Illumination

- Part III: Rendering
- Chapter 14. Advanced Techniques for Realistic Real-Time Skin Rendering
- Chapter 15. Playable Universal Capture
- Chapter 16. Vegetation Procedural Animation and Shading in Crysis
- Chapter 17. Robust Multiple Specular Reflections and Refractions
- Chapter 18. Relaxed Cone Stepping for Relief Mapping
- Chapter 19. Deferred Shading in Tabula Rasa
- Chapter 20. GPU-Based Importance Sampling

- Part IV: Image Effects
- Chapter 21. True Impostors
- Chapter 22. Baking Normal Maps on the GPU
- Chapter 23. High-Speed, Off-Screen Particles
- Chapter 24. The Importance of Being Linear
- Chapter 25. Rendering Vector Art on the GPU
- Chapter 26. Object Detection by Color: Using the GPU for Real-Time Video Image Processing
- Chapter 27. Motion Blur as a Post-Processing Effect
- Chapter 28. Practical Post-Process Depth of Field

- Part V: Physics Simulation
- Chapter 29. Real-Time Rigid Body Simulation on GPUs
- Chapter 30. Real-Time Simulation and Rendering of 3D Fluids
- Chapter 31. Fast N-Body Simulation with CUDA
- Chapter 32. Broad-Phase Collision Detection with CUDA
- Chapter 33. LCP Algorithms for Collision Detection Using CUDA
- Chapter 34. Signed Distance Fields Using Single-Pass GPU Scan Conversion of Tetrahedra
- Chapter 35. Fast Virus Signature Matching on the GPU

- Part VI: GPU Computing
- Chapter 36. AES Encryption and Decryption on the GPU
- Chapter 37. Efficient Random Number Generation and Application Using CUDA
- Chapter 38. Imaging Earth's Subsurface Using CUDA
- Chapter 39. Parallel Prefix Sum (Scan) with CUDA
- Chapter 40. Incremental Computation of the Gaussian
- Chapter 41. Using the Geometry Shader for Compact and Variable-Length GPU Feedback

- Preface