Performance Analysis Tools

NVIDIA Nsight Systems

NVIDIA® Nsight™ Systems is a system-wide performance analysis tool designed to visualize application’s algorithm, help you select the largest opportunities to optimize, and tune to scale efficiently across any quantity of CPUs and GPUs in your computer; from laptops to DGX servers.

NVIDIA® Nsight™

The ultimate development platform for heterogeneous computing. Work with powerful debugging and profiling tools, optimize the performance of your CPU and GPU code. Find out about the Eclipse Edition and the graphics debugging enabled Visual Studio Edition.

NVIDIA Visual Profiler

This is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows.

TAU Performance System®

This is a profiling and tracing toolkit for performance analysis of hybrid parallel programs written in CUDA, and pyCUDA., and OpenACC.

VampirTrace

A performance monitor which comes with CUDA, and PyCUDA support to give detailed insight into the runtime behavior of accelerators. Enables extensive performance analysis and optimization of hybrid programs.

The PAPI CUDA Component

A hardware performance counter measurement technology for the NVIDIA CUDA platform which provides access to the hardware counters inside the GPU. Provides detailed performance counter information regarding the execution of GPU kernels.

The NVIDIA CUDA Profiling Tools Interface

(CUPTI) provides performance analysis tools with detailed information about GPU usage in a system. CUPTI is used by performance analysis tools such as the NVIDIA Visual Profiler, TAU and Vampir Trace.

NVIDIA Topology-Aware GPU Selection

(NVTAGS) is a toolset for HPC applications that enables faster solve times with high GPU communication-to-application run-time ratios. NVTAGS intelligently and automatically assigns GPUs to message passing interface (MPI) processes, thereby reducing overall GPU-to-GPU communication time.

Looking for expert advice on optimizing your GPU code or identifying opportunities for GPU acceleration in your application?
Try reviewing the documentation and educational materials below or get in touch with industry experts and NVIDIA engineers on the CUDA Developer forums