SLI Zone
NVIDIA.com Developer Home

Last Updated: 10 / 13 / 2008

Real-Time Performance Optimization

Performance Documentation from NVIDIA

Fashions may come and go, but horsepower never goes out of style. The listings below cover some of NVIDIA's work on optimal performance tools, data amplication, and other techniques and techologies aimed at delivering the highest possible graphics and computing performance for developers.

Documentation | Tools

Documentation

Samples from NVIDIA Graphics SDK 10.5:
Skinned Instancing (Whitepaper)
This sample shows the use of instancing and vertex texture fetch on the GeForce 8 Series to implement a crowd of GPU animated characters, all independently animating but drawn with a single draw call.
Instanced Tessellation
This example shows how to simulate tessellation using instancing. Per patch tessellation levels are implemented as described by Dyken et al in Semi-uniform Adaptive Patch Tessellation.
The example uses tessellation to render a displaced subdivision surface rendered with a precomputed Bezier approximation to the Catmull Clark surface. Control points of the bezier mesh are computed using the algorithm by Loop and Schaefer in Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches.
Samples from NVIDIA Graphics SDK 9.52:
SLI Best Practices (Whitepaper)
This code sample demonstrates the proper way to detect SLI-configured systems, as well as how to achieve maximum performance benefit from SLI.
DXSAS Sample Implementation 0.8 (User Guide)
This running code sample demonstrates how to implement a DirectX Semantics and Annotations (DXSAS) ScriptExecute parser in an engine. Full support for the standard annotations and semantics is provided. The user interface lets you apply multiple scene and model effects simultaneously, so the you can see hundreds of different effect combinations. All effect files were developed using FX Composer.
Instancing (Whitepaper)
This sample uses Microsoft's Direct3D9 Instancing Group to render thousands of meshes on the screen with only a handful of draw calls. This significantly reduces the CPU overhead of submitting many separate draw calls and is a great technique for rendering trees, rocks, grass, RTS units and other groups of similar (but necessarily identical) objects.
Pseudo Instancing in OpenGL (Whitepaper)
This sample demonstrates a technique for speeding up the rendering of instanced geometry with GLSL.
Query Sample (User Guide)
Shows how to check for availability and use of the various query types supported in DirectX9. This sample queries for and displays results for queries of type: event, occlusion, timestamp, timestamp frequency, timestamp disjoint, and if running with the debug runtime, resource and vertex stats.
Get GPU and System Info (Whitepaper)
This sample querries Microsoft's IDXDiagContainer interface to retrieve graphics hardware and system information. Most notable is the retrieval of the amount of physical video memory on the primary graphics device. The IDXDiagContainer interface is wrapped in a convenient C++ class, and no IDirect3DDevice9 object is required to retrieve the information.
PBO Texture Performance (User Guide)
Explores various ways to use OpenGL pixel buffer objects (PBOs). This code sample can also be used to see the maximum bus-transfer rates for textures to and from the GPU.
NVTriStrip Test Application (User Guide)
This simple example demonstrates the use of the NvTriStrip optimizer library.
Normalization Heuristics (Whitepaper)
This entry and the accompanying technical report answer the question: "When is cube-map normalization faster than normalize()?" The report describes experiments performed with a non-trivial pixel shader, and uses the experimental results to derive useful rules of thumb regarding the performance and quality of normalization in pixel shaders. These heuristics provide tuning dials that developers can use to trade quality for performance (and vice versa) in 3D applications. To gain an intuitive understanding of these performance-quality tradeoffs, the entry application is provided to allow the same experiments described in this report.
Atlas Comparison Viewer (Whitepaper)
This sample compares texturing from regular textures versus textures from an atlas. Putting multiple textures in at atlas can help reduce draw calls, thus decreasing CPU load. Comparisons are made with respect to both image quality and performance.
Occlusion Query - OpenGL
This sample illustrates occlusion query using a simple sphere and a plane.
Occlusion Qery - DirectX (Whitepaper)
This sample shows usage of occlusion queries to cull out complex objects and save bandwidth to the card. Occlusion queries report how many pixels a set of draw calls actually wrote to.
NVDeviceID (User Guide)
A simple example to identify NVIDIA's product families through the use of Direct3D adapter identifier. See here to access an updated list of NVIDIA device ids.
Vendor/Device ID Sample (User Guide)
A simple sample to show how to retrieve vendor and device ID's for the primary display device.
Practical Perf. Analysis
An overview of the graphics tuning process (with video).
Presentations from GDC 2008:
GPU Optimization with the Latest NVIDIA Performance Tools
Also see the companion case study: Optimizing Marble Blast Ultra
Advanced Visual Effects with Direct3D:
GPU Gems 2 online:
Chapter 3. Inside Geometry Instancing
Chapter 4. Segment Buffering
Chapter 5. Optimizing Resource Management with Multistreaming
Chapter 6. Hardware Occlusion Queries Made Useful
Chapter 7. Adaptive Tessellation of Subdivision Surfaces with Displacement Mapping
Chapter 24. Using Lookup Tables to Accelerate Color Transformations
GPU Gems online:
Chapter 28. Graphics Pipeline Performance
Chapter 29. Efficient Occlusion Culling
Chapter 35. Leveraging High-Quality Software Rendering Effects in Real-Time Applications

Performance Tools

PerfKit
A comprehensive suite of performance tools, including:
PerfHUD
The premiere real-time performance overview tool.
PerfSDK
An API for collecting performance data from within your game code.
GLExpert
OpenGL performance monitoring and tracing.
PerfGraph
A cross-platform and open-source perfomance monitor.
ShaderPerf 2
A command-line profiler for shader code (also incorporated into FX Composer 2.5).
gDEBugger
OpenGL and OpenGL ES profiling and debugging.

Want to Learn More? NVIDIA Documentation Home Page




nvidiadeveloper Twitterfeed
Popular References
Free Books Online