The NVIDIA CUDA Profiling Tools Interface (CUPTI) provides performance analysis tools with detailed information about how applications are using the GPUs in a system. CUPTI provides two simple yet powerful mechanisms that allow performance analysis tools such as the NVIDIA Visual Profiler, TAU and Vampir Trace to understand the inner workings of an application and deliver valuable insights to developers. 
The first mechanism is a callback API that allows tools to inject analysis code into the entry and exit point of each CUDA C Runtime (CUDART) and CUDA Driver API function.  Using this callback API, tools can monitor an application’s interactions with the CUDA Runtime and driver.  The second mechanism allows performance analysis tools to query and configure hardware event counters designed into the GPU and software event counters in the CUDA driver.  These event counters record activity such as instruction counts, memory transactions, cache hits/misses, divergent branches, and more.  

Key Features

 

  • Trace CUDA API usage by registering callbacks for API calls of interest
    • Full support for  entry and exit points in the CUDA C Runtime (CUDART) and CUDA Driver
  • Sample hardware and software event counters, including:
    • Instruction count and throughput
    • Memory load/store events and throughput
    • Cache hits/misses
    • Branches and divergent branches
    • Many more
  • Enables automated bottleneck identification based on metrics such as instruction throughput, memory throughput, and more
  • Normalized timestamps for CPU and GPU events

See the CUPTI User Guide for a complete listing of hardware and software event counters available for performance analysis tools.
 


Vampir Trace supports GPU performance analysis

TAU profiling GPU-Accelerated NAMD

Availability

The CUPTI library is supported on all platforms supported by the CUDA Toolkit, and is available on the CUDA Downloads page as part of the CUDA Tools SDK.

References