|
|

Last Updated:
10
/
13
/
2008
Real-Time Performance Optimization
Fashions may come and go, but horsepower never goes out of style. The listings below
cover some of NVIDIA's work on optimal performance tools, data amplication,
and other techniques and techologies aimed at delivering the highest possible
graphics and computing performance for developers.
Documentation |
Tools
- Samples from NVIDIA Graphics SDK 10.5:
- Skinned Instancing (Whitepaper)
- This sample shows the use of instancing and vertex texture
fetch on the GeForce 8 Series to implement a crowd of GPU
animated characters, all independently animating but drawn with
a single draw call.
- Instanced Tessellation
- This example shows how to simulate tessellation using
instancing. Per patch tessellation levels are implemented as
described by Dyken et al in Semi-uniform Adaptive Patch
Tessellation.
- The example uses tessellation to render a displaced subdivision
surface rendered with a precomputed Bezier approximation to the
Catmull Clark surface. Control points of the bezier mesh are
computed using the algorithm by Loop and Schaefer in Approximating
Catmull-Clark Subdivision Surfaces with Bicubic
Patches.
- Samples from NVIDIA Graphics SDK 9.52:
- SLI Best Practices (Whitepaper)
- This code sample demonstrates the proper way to detect
SLI-configured systems, as well as how to achieve maximum
performance benefit from SLI.
- DXSAS Sample Implementation 0.8 (User Guide)
- This running code sample demonstrates how to implement a
DirectX Semantics and Annotations (DXSAS) ScriptExecute parser
in an engine. Full support for the standard annotations and
semantics is provided. The user interface lets you apply
multiple scene and model effects simultaneously, so the you can
see hundreds of different effect combinations. All effect files
were developed using FX
Composer.
- Instancing (Whitepaper)
- This sample uses Microsoft's Direct3D9 Instancing Group to
render thousands of meshes on the screen with only a handful of
draw calls. This significantly reduces the CPU overhead of
submitting many separate draw calls and is a great technique
for rendering trees, rocks, grass, RTS units and other groups
of similar (but necessarily identical) objects.
- Pseudo Instancing in OpenGL (Whitepaper)
- This sample demonstrates a technique for speeding up the
rendering of instanced geometry with GLSL.
- Query Sample (User Guide)
- Shows how to check for availability and use of the various
query types supported in DirectX9. This sample queries for and
displays results for queries of type: event, occlusion,
timestamp, timestamp frequency, timestamp disjoint, and if
running with the debug runtime, resource and vertex stats.
- Get GPU and System Info (Whitepaper)
- This sample querries Microsoft's IDXDiagContainer interface to
retrieve graphics hardware and system information. Most notable
is the retrieval of the amount of physical video memory on the
primary graphics device. The IDXDiagContainer interface is
wrapped in a convenient C++ class, and no IDirect3DDevice9
object is required to retrieve the information.
- PBO Texture Performance (User Guide)
- Explores various ways to use OpenGL pixel buffer objects
(PBOs). This code sample can also be used to see the maximum
bus-transfer rates for textures to and from the GPU.
- NVTriStrip Test Application (User Guide)
- This simple example demonstrates the use of the NvTriStrip optimizer library.
- Normalization Heuristics (Whitepaper)
- This entry and the accompanying technical report answer the
question: "When is cube-map normalization faster than
normalize()?" The report describes experiments performed with
a non-trivial pixel shader, and uses the experimental results
to derive useful rules of thumb regarding the performance and
quality of normalization in pixel shaders. These heuristics
provide tuning dials that developers can use to trade quality
for performance (and vice versa) in 3D applications. To gain an
intuitive understanding of these performance-quality tradeoffs,
the entry application is provided to allow the same experiments
described in this report.
- Atlas Comparison Viewer (Whitepaper)
- This sample compares texturing from regular textures versus
textures from an atlas. Putting multiple textures in at atlas
can help reduce draw calls, thus decreasing CPU load.
Comparisons are made with respect to both image quality and
performance.
- Occlusion Query - OpenGL
- This sample illustrates occlusion query using a simple sphere and a plane.
- Occlusion Qery - DirectX (Whitepaper)
- This sample shows usage of occlusion queries to cull out
complex objects and save bandwidth to the card. Occlusion
queries report how many pixels a set of draw calls actually
wrote to.
- NVDeviceID (User Guide)
- A simple example to identify NVIDIA's product families through
the use of Direct3D adapter identifier. See here to access an updated
list of NVIDIA device ids.
- Vendor/Device ID Sample (User Guide)
- A simple sample to show how to retrieve vendor and device ID's for
the primary display device.
- Practical Perf. Analysis
- An overview of the graphics tuning process (with video).
- Presentations from GDC 2008:
- GPU Optimization with the Latest NVIDIA Performance Tools
- Also see the companion case study: Optimizing
Marble Blast Ultra
- Advanced Visual Effects with Direct3D:
- GPU Gems 2 online:
- Chapter 3. Inside Geometry Instancing
- Chapter 4. Segment Buffering
- Chapter 5. Optimizing Resource Management with
Multistreaming
- Chapter 6. Hardware Occlusion Queries Made Useful
- Chapter 7. Adaptive Tessellation of Subdivision
Surfaces with Displacement Mapping
- Chapter 24. Using Lookup Tables to Accelerate
Color Transformations
- GPU Gems online:
- Chapter 28. Graphics Pipeline Performance
- Chapter 29. Efficient Occlusion Culling
- Chapter 35. Leveraging High-Quality Software
Rendering Effects in Real-Time Applications
- PerfKit
- A comprehensive suite of performance tools, including:
- PerfHUD
- The premiere real-time performance overview tool.
- PerfSDK
- An API for collecting performance data from within your game code.
- GLExpert
- OpenGL performance monitoring and tracing.
- PerfGraph
- A cross-platform and open-source perfomance monitor.
- ShaderPerf 2
- A command-line profiler for shader code (also incorporated into FX Composer 2.5).
- gDEBugger
- OpenGL and OpenGL ES profiling and debugging.
Want to Learn More?
NVIDIA Documentation Home Page
|
|
  |