NVIDIA® Nsight™ Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool. In addition, its baseline feature allows users to compare results within the tool. Nsight Compute provides a customizable and data-driven user interface and metric collection and can be extended with analysis scripts for post-processing results.

 Download Now 
Version 2019.3 New Features  |  Revision History

NVIDIA® Nsight™ Compute is freely offered through the NVIDIA Registered Developer Program and as part of the CUDA Toolkit

Baseline Comparisons

  • Set multiple baselines to compare variations in GPU architecture, kernel launch parameters, memory usage, ...
  • Compare performance metrics between baselines and the current run
  • New: Now with the ability to compare child processes

Run from NsCompute GUI or from Console Command Line

  • NsCompute GUI provides text for console commands
  • GUI/Console provide similar features, functionality, output, and reports

CUDA 10.1 Update 1 Task Graph Profiling

  • Stop at a kernel launch from a graph node
  • State of graph node shown in resource page
  • Export graph visualization

Source Code Correlation

  • Correlate individual Source, SASS, or PTX lines and metrics
  • Shown here with PC Sampling data available in Volta and Turing architectures
  • New: Improved heat map for identifying high metric values


  • Interactive kernel profiler
  • Profiler report for kernels and/or child processes
  • Diff’ing results across one or multiple reports using baselines
  • Fast data collection
  • Intuitive UI for interactive profiling
  • Command line operation for manual and automated profiling
  • Fully customizable reports and rules

Variations from the Nsight Compute 2019.3.0 found in
CUDA Toolkit 10.1 Update 1

    2019.3.1 Improvements:
    • New 'Send Feedback...' button under 'Help' menu
    2019.3.1 Resolved Issues:
    • Fixed calculation of theoretical occupancy for grids with blocks that are not a multiple of 32 threads
    • Fixed intercepting child processes launched through Python's subprocess.Popen() class on Linux
    • Fixed issue of NVTX push/pop ranges not showing up for child threads when using Nsight Compute command line tools
    • Fixed description in rule covering the IMC stall reason
    • Fixed cases were baseline values were not correctly calculated when comparing reports of different architectures
    • Fixed accessing instanced metrics in the NvRules API
    • Fixed a bug that could cause the collection of unnecessary metrics in the Interactive Profile activity
    • Fixed potential crash on exit of the profiled target application
    • Switched underlying metric for SOL FB in the GPU Speed Of Light section to be driven by dram__throughput.avg.pct_of_peak_sustained_elapsed instead of fbpa__throughput.avg.pct_of_peak_sustained_elapsed
    • Improved performance for metric lookups on the Source page

System Requirements

Supported platforms


  • Linux x86_64[1]
  • Windows x86_64[1]
  • MacOS[1]

  • Linux x86_64[1]
  • Windows x86_64[1]
  • DRIVE OS QNX aarch64[2][3]
  • DRIVE OS Linux aarch64[2][3]
     [1] available in this download and the CUDA Desktop Toolkit
     [2] available in the Embedded or Drive toolkits only
     [3] Only the command line interface (CLI) is provided for these platforms. There is no Nsight Compute GUI application for these platforms

Supported GPU architectures

  • Pascal: GP10x (excluding GP100)
  • Volta: GV100
  • Turing: TU1xx



Nsight Compute Documentation


To provide feedback, request additional features, or report Nsight Compute issues, please use the Developer Forums