PerfWorks is a C++ API used for GPU performance analysis on NVIDIA GPUs. PerfWorks allows developers the ability to instrument an application and to access low-level performance metrics on NVIDIA GPUs. PerfWorks delivers these metrics in order to give developers the ability to recognize top performance limiters quickly and make appropriate application changes to remove the associated bottlenecks. These metrics can be collected over user ranges, draw calls or dispatches. There are four metric categories including: cumulative work, timing, activity and throughput.

  • Cumulative work includes work achieved over time such as number of shaded pixels
  • Timing refers to time duration or average clock rate calculations that make it easier to understand a scenario.
  • Activity tells you where the GPU is stalled and where the GPU is active.
  • Throughput refers to the rate of operations e.g. the number of instructions executed.

PerfWorks is the successor to NVIDIA’s Perfkit. PerfWorks adds range based profiling and it supports next generation APIs featuring multi-threaded GPU work submission. GPU generations supported by PerfWorks includes Maxwell, Pascal and future generations when available. PerfWorks is used by NVIDIA internal tools including: Tegra Graphics Debugger, Nsight Visual Studio Edition and other future products.

Key Features

  • Support for collecting Graphics Metrics. See Figure 1 below.
  • Figure 1

  • Support for collecting Compute Metrics. See Figure 2 below.
    • These Compute Metrics can be Compute-Bound, Memory Bound or Latency-Bound.

    Figure 2

  • Range Based Profiling
    • Other tools profile one kernel or draw-call at a time. However, with PerfWorks a developer can profile them as a range therefore allowing for inherent parallelism of execution.
  • Multi-Pass Profiling
    • The hardware has a limited number of physical counters. To collect more than the physical limit, PerfWorks requires the application to deterministically replay the GPU work multiple times. During each replay the application must make the same GPU calls with the same range delimiters and a different set of counters is collected.
  • D3D12 Support
    • Supports multithreaded GPU work submission. D3D12 is a different way of coding than CUDA or OpenGL.
  • Support for nvperf
    • nvperf is a command line tool for offline querying of PerfWorks metrics

Supported Platforms

PerfWorks supports NVIDIA Tegra, GeForce, Quadro, and Tesla GPUs based upon the NVIDIA Maxwell, and Pascal architectures.

Developer Webinars

Instructional Videos and Webinars can be found here.


PerfWorks SDK is available to developers on an evaluation basis. Requests for consideration to receive the SDK can be sent to

PIX for Windows

Microsoft’s PIX for Windows will support profiling for NVIDIA GPUs using PerfWorks. PIX for Windows is a Direct3D12 performance tuning and debugging tool for game developers. It has a long and storied history spanning three generations of Xbox console. For more information on PIX for Windows, please click on Microsoft’s blog.