Content Creation / Rendering

Advanced API Performance: SetStablePowerState

A graphic of a computer sending code to multiple stacks.

This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

Most modern processors, including GPUs, change processor core and memory clock rates during application execution. These changes can vary performance, introducing errors in measurements and rendering comparisons between runs difficult.

  • Use the nvidia-smi utility to set the GPU core and memory clocks before attempting measurements. This command is installed by typical driver installations on Windows and Linux. Installation locations may vary by OS version but should be fairly stable.
    • Run commands on an administrator console on Windows, or prepend sudo to the following commands on Linux-like OSs.
    • To query supported clock rates
      • nvidia-smi --query-supported-clocks=timestamp,gpu_name,gpu_uuid,memory,graphics --format=csv
    • To set the core and memory clock rates, respectively:
      • nvidia-smi --lock-gpu-clocks=<core_clock_rate>
      • nvidia-smi --lock-memory-clocks=<memory_clock_rate>
    • Perform performance capture or other work.
    • To reset the core and memory clock rates, respectively:
      • nvidia-smi --reset-gpu-clocks
      • nvidia-smi --reset-memory-clocks
    • For general use during a project, it may be convenient to write a simple script to lock the clocks, launch your application, and after exit, reset the clocks.
    • For command-line help, run nvidia-smi --help. There are shortened versions of the commands listed earlier for your convenience.
    • For more information, see NVIDIA System Management Interface.
  • Use the DX12 function SetStablePowerState to read the GPU’s predetermined stable power clock rate. The stable GPU clock rate may vary by board.
    • Modify a DX12 sample to invoke SetStablePowerState.
    • Execute nvidia-smi -q -d CLOCK, and record the Graphics clock frequency with the SetStablePowerState sample running. Use this frequency with the --lock-gpu-clocks option.
  • Use Nsight Graphics’s GPU Trace activity with the option to lock core and memory clock rates during profiling (Figure 1).
Screenshot of Nsight Graphics UI with Locks Clocks to Base checkbox.
Figure 1. Lock Clocks to Base checkbox
  • Don’t rely solely on the SetStablePowerState function when profiling. SetStablePowerState does not lock the memory clock, which makes the results less comparable than when the appropriate clocks are locked with nvidia-smi.
Discuss (14)

Tags