Last Updated:
11
/
11
/
2008
David Lee Swenson, a graphics engineer who worked on the game Spore, had a problem. Spore featured a dynamic material system that generated hundreds of HLSL shaders. Finding trends, identifying poorly performing shaders and tracking shader performance characteristics over time and over numerous iterations created quite a challenge.
To tackle this challenge, Swenson built an automated shader performance monitoring system that used a combination of the NVIDIA® ShaderPerf utility and homegrown code.
The performance testing process was as follows:
- A built-in command in the Spore engine dumped all the shader permutations in use into individual HLSL files in a directory.
- For each shader, the system ran ShaderPerf and collected the performance results across all supported NVIDIA chipsets.
- This information was then parsed into a comma separated values file (.csv) by a simple command line utility that David wrote so the results could be viewed and sorted in a spreadsheet. Building the parsing utility took approximately half a day.
The data in the spreadsheet below reflects shaders used during combat in the Civilization game. The output from the Microsoft shader compiler is in the first three columns, and the two highlighted shaders that have the exact same instruction count, distribution and texture usage. However, the next several columns show that these shaders have different performance characteristics on the NV40 chipset. This kind of insight is invaluable when optimizing shaders.
Figure 1: The Shader Performance Spreadsheet
Figure 2: Combat in Spore's Civilization Game
The Spore team used the performance testing system in a semi-automated way. When a performance problem arose, they would first check whether the game was CPU bound or GPU bound. If the game was GPU bound, they’d determine the root of the problem by using tools such as the NVIDIA PerfHUD™ performance analysis application in addition to debug rendering modes in the game that show performance statistics such as pixel instruction cost, vertex instruction cost, and overdraw. The shaders for the scene in question would be dumped and the shader performance tool run. The spreadsheet allowed the team to quickly find the cross section of the chipset in question and the underperforming shaders.
For Swenson, the most important data that the system provided were the cycle counts and pixels/second numbers for every shader on each supported card, as well as the register usage of each shader. Using those numbers, it was easy to pinpoint and track the performance cost of each shader at any point in time. The system helped the Spore engineers quickly catch several big issues, including the erroneous use of overly long shaders (160 instructions), several bugs in the shader generation code, and a check-in in which a particular “low-quality” shader that was erroneously replaced with the “high-quality” version.
Swenson got subtle insights from the data as well. Seeing the different instruction and throughput numbers across all the game’s supported GPUs in one place gave him a high-level understanding of the performance variations across different GPU generations. Having the data aggregated allowed him to quickly analyze select groups of shaders. Analyzing effect shaders as a group was especially useful because effects tend to be on-screen in a temporally sporadic way. The benefit of this capability was that large variations in shader cost in a group were quickly identified and corrected.
The Spore team chose to use ShaderPerf in a semi-automated way, but it would be easy to imagine a completely automated system using ShaderPerf as part of a game’s continuous integration system. We envision that on each build the system could run ShaderPerf, with any differences from the previous run accumulated into an e-mail or a Web page.