The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Windows, and ARM. The NVIDIA Visual Profiler is available as part of the CUDA Toolkit.

Note that NVIDIA® CUDA Toolkit 11.0 (and later) no longer supports development or running applications on macOS. While there are no tools which use macOS as a target environment, NVIDIA made the macOS host version of Visual Profiler available up to CUDA 12.4. However, the macOS host versions were dropped as of CUDA 12.5.

Note that PowerPC versions were also dropped as of CUDA 12.5.

       

 

(Click to Zoom)

(Click To Zoom)

Overview

  • Focus on the information that matters
    Quickly identify potential performance bottleneck issues in your applications using highly configurable tables and graphical views
  • Automated performance analysis
    Perform automated analysis of your application to identify performance bottlenecks and get optimization suggestions that can be used to improve performance
  • Unified CPU and GPU Timeline
    View CUDA activity occurring on both CPU and GPU in a unified time line, including CUDA API calls, memory transfers and CUDA launches.
  • CUDA API trace
    View all memory transfers, kernel launches, and other API functions on the same timeline
  • Drill down to raw data
    Gain low-level insights by looking at performance metrics collected directly from GPU hardware counters and software instrumentation.
  • Compare results across multiple sessions
    Confirm performance improvements by comparing against previous sessions
  • Analyze data collected from remote systems
    Use the command line profiler using environment variables to collect data from multiple systems and analyze the results in Visual Profiler
  • CUDA Dynamic Parallelism
    View timeline for applications that use CUDA Dynamic Parallelism including both host-launched and device-launched kernels and the parent-child relationship between kernels.
  • Guided Application Analysis
    Use the guided analysis mode has to get step-by-step analysis and optimization guidance. The analysis results now include graphical visualizations to more clearly indicate the optimization opportunities.
  • Power, thermal, and clock profiling
    Observe how GPU power, thermal, and clock values vary during application execution

The latest version of Visual Profiler with support for both CUDA C/C++ applications is available with the CUDA Toolkit and is supported on all platforms supported by the CUDA Toolkit.

Developers should be sure to check out NVIDIA Nsight Systems for our next generation profiling tool with Linux, Windows, and Arm support. Be sure to review our tool migration recommendations to make your transition easier.

For development and debugging on Windows, see Nsight Visual Studio Edition and NVIDIA Nsight Systems Visual Studio integration with NVIDIA Nsight Integration.

For more information on the Visual Profiler and other CUDA development tools:

 

Questions on CUDA Tools?

If you encounter difficulty with any of the CUDA Tools or have more questions please contact the NVIDIA tools team at (cudatools@nvidia.com).