NVIDIA Nsight Compute 2019.3

NVIDIA® Nsight™ Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool. In addition, its baseline feature allows users to compare results within the tool. Nsight Compute provides a customizable and data-driven user interface and metric collection and can be extended with analysis scripts for post-processing results.

Download Now

Version 2019.3 New Features | Revision History

NVIDIA® Nsight™ Compute is freely offered through the NVIDIA Registered Developer Program and as part of the CUDA Toolkit

Baseline Comparisons

Set multiple baselines to compare variations in GPU architecture, kernel launch parameters, memory usage, ...
Compare performance metrics between baselines and the current run
New: Now with the ability to compare child processes

Run from NsCompute GUI or from Console Command Line

NsCompute GUI provides text for console commands
GUI/Console provide similar features, functionality, output, and reports

CUDA 10.1 Update 1 Task Graph Profiling

Stop at a kernel launch from a graph node
State of graph node shown in resource page
Export graph visualization

Source Code Correlation

Correlate individual Source, SASS, or PTX lines and metrics
Shown here with PC Sampling data available in Volta and Turing architectures
New: Improved heat map for identifying high metric values

Features

Interactive kernel profiler
Profiler report for kernels and/or child processes
Diff’ing results across one or multiple reports using baselines
Fast data collection
Intuitive UI for interactive profiling
Command line operation for manual and automated profiling
Fully customizable reports and rules

Variations from the Nsight Compute 2019.3.0 found in
CUDA Toolkit 10.1 Update 1

New 'Send Feedback...' button under 'Help' menu

Fixed calculation of theoretical occupancy for grids with blocks that are not a multiple of 32 threads
Fixed intercepting child processes launched through Python's subprocess.Popen() class on Linux
Fixed issue of NVTX push/pop ranges not showing up for child threads when using Nsight Compute command line tools
Fixed description in rule covering the IMC stall reason
Fixed cases were baseline values were not correctly calculated when comparing reports of different architectures
Fixed accessing instanced metrics in the NvRules API
Fixed a bug that could cause the collection of unnecessary metrics in the Interactive Profile activity
Fixed potential crash on exit of the profiled target application
Switched underlying metric for SOL FB in the GPU Speed Of Light section to be driven by dram__throughput.avg.pct_of_peak_sustained_elapsed instead of fbpa__throughput.avg.pct_of_peak_sustained_elapsed
Improved performance for metric lookups on the Source page

System Requirements

Supported platforms

Host

Linux x86_64^[1]
Windows x86_64^[1]
MacOS^[1]

Target

Linux x86_64^[1]
Windows x86_64^[1]
DRIVE OS QNX aarch64^[2][3]
DRIVE OS Linux aarch64^[2][3]

     [1] available in this download and the CUDA Desktop Toolkit
     [2] available in the Embedded or Drive toolkits only
     [3] Only the command line interface (CLI) is provided for these platforms. There is no Nsight Compute GUI application for these platforms

Supported GPU architectures

Pascal: GP10x (excluding GP100)
Volta: GV100
Turing: TU1xx

Drivers

Please use the 425.25 drivers provided with CUDA Toolkit 10.1 Update 1 production release or a more recent version.

Documentation

Nsight Compute Documentation

Support

To provide feedback, request additional features, or report Nsight Compute issues, please use the Developer Forums