NVIDIA® Nsight™ Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool. In addition, its baseline feature allows users to compare results within the tool. Nsight Compute provides a customizable and data-driven user interface and metric collection and can be extended with analysis scripts for post-processing results.



 Download 2022.3.0
Version 2022.3 New Features  |  Revision History

NVIDIA® Nsight™ Compute is freely offered through the NVIDIA Registered Developer Program and as part of the CUDA Toolkit


Roofline Analysis

Memory Workload Analysis

Baseline Comparisons

  • Set multiple baselines to compare variations in GPU architecture, kernel launch parameters, memory usage, ...
  • Compare performance metrics between baselines and the current run, including the ability to compare child processes

Run from Nsight Compute GUI or from Console Command Line

  • Nsight Compute GUI provides text for console commands
  • GUI/Console provide similar features, functionality, output, and reports

CUDA Task Graph Profiling

  • Stop at a kernel launch from a graph node
  • State of graph node shown in resource page
  • Export graph visualization

Source Code Correlation

  • Correlate individual Source, SASS, or PTX lines and metrics
  • Shown here with PC Sampling data available in Volta and Turing architectures
  • Heat map for identifying high metric values

Nsight Compute integrates into Visual Studio using NVIDIA Nsight Integration
Visual Studio project settings are transferred to the Nsight Compute

Other Features

  • Interactive kernel profiler
  • Profiler report for kernels and/or child processes
  • Diff’ing results across one or multiple reports using baselines
  • Fast data collection
  • Intuitive UI for interactive profiling
  • Command line operation for manual and automated profiling
  • Fully customizable reports and rules
  • This version is a reposting of the version in the CUDA ToolKit 11.8.
  • A MacOS host download is available here, but not included in the CUDA Toolkit.
  • We may update this site with bug fixes, as needed.
  • System Requirements

    Supported platforms

      Host
      • Linux x86_64[1]
      • Linux aarch64 sbsa[1]
      • Linux aarch64 (L4T)[2]
      • Windows x86_64[1]
      • MacOS[1]
      Target
      • Linux x86_64[1]
      • Windows x86_64[1]
      • Linux PowerPC[1]
      • Linux aarch64 sbsa[1]
      • Linux aarch64 (L4T)[2]
      • DRIVE OS QNX aarch64[2][3]
      • DRIVE OS Linux aarch64[2][3]

    Supported NVIDIA GPU architectures

    • Hopper: GH100
    • Ada: AD10x
    • Ampere: A100 with Multi-Instance GPU, GA10x
    • Turing: TU1xx
    • Volta: GV100[1], GV10B[2]

    Drivers

      Please use the following drivers
      • 521.98 (Windows)
      • 520.61.03 (Linux)
      provided with CUDA Toolkit 11.8 production release or a more recent version.
    [1] available in this download and the CUDA Desktop Toolkit
    [2] available in the Embedded or Drive toolkits only
    [3] Only the command line interface (CLI) is provided for these platforms. There is no Nsight Compute GUI application for these platforms

    DLI Course: Optimizing CUDA Machine Learning Codes with Nsight Profiling Tools

    Use Nsight Compute to interactively profile and analyze individual CUDA kernels, optimizing them based on your findings. Combine the use of Nsight Systems and Nsight Compute into an effective optimization workflow for many GPU-accelerated machine learning applications. Enroll Now >

    Documentation, Videos, and Blogs

    Nsight Compute Documentation
    Videos
    Blogs

    Support

    To provide feedback, request additional features, or report Nsight Compute issues, please use the Developer Forums