NVIDIA Nsight Compute
NVIDIA® Nsight™ Compute is an interactive profiler for CUDA® and NVIDIA OptiX™ that provides detailed performance metrics and API debugging via a user interface and command-line tool. Users can run guided analysis and compare results with a customizable and data-driven user interface, as well as post-process and analyze results in their own workflows.
Get startedNVIDIA Nsight Compute is also available as part of the CUDA Toolkit.
Watch an overview video about how guided analysis in Nsight Compute assists CUDA kernel optimizations.
Highlighting GPU throughput, warp state statistics, and source code correlation.
Profile CUDA and OptiX
For developing with CUDA or OptiX, application-level performance tuning is just the beginning of GPU optimization. When a deeper dive into compute processes is needed, it's crucial to have both visibility to hardware activity and the level of understanding required to optimize it. With NVIDIA Nsight Compute, you don’t have to be a hardware architecture expert to do this; Nsight Compute is a CUDA and OptiX profiler that detects performance issues, displays them intuitively, and delivers built-in guidance from NVIDIA engineers on how to resolve them.
Leverage NVIDIA’s Insight
Nsight Compute is designed to assist the hefty task of kernel profiling with a powerful set of tools bundled with NVIDIA’s own insights. By visualizing hardware performance metrics, it translates traditionally cryptic values into actionable information. The level of detail that Nsight Compute uncovers is ordered hierarchically, such that memory utilization can be correlated down to individual lines of source code. Built into every step of the process, guided analysis from NVIDIA’s own rule set identifies common performance limiters and offers valuable optimization advice.
Customize and Collaborate
For expert users, Nsight Compute can be extended with custom metric collection and analysis workflows. For cross-platform development, baseline comparisons reveal performance variations between different GPU architectures. For collaboration, dependencies and source information can be imported into the report and shared with colleagues and teams. Profiling can be conducted through the Nsight Compute GUI, or through the CLI; on the local device, or remotely. Python developers can leverage the NVRules API for automating analysis. Nsight Compute’s options for different development areas, experience levels, and project sizes are expansive.
Explore Key Features
Find optimizations with guided analysis.
Nsight Compute’s report pages provide insight into all aspects of a profile. The details page offers metrics that address overall GPU utilization, how performance is connected to various hardware concepts, and concludes with recommended optimization actions. Insights into performance problems and solutions from NVIDIA’s best practices are provided along the way via guided analysis. Baseline comparisons enable efficient feedback directly in the tool to understand the effects of any changes to the workload.
The details page raises flags on low GPU throughput and automatically detects performance limiters that are the potential source.
Memory chart visualizing data transfer, where pipelines are colored with a heatmap based on their utilization.
Inspect memory workload.
Memory workload analysis builds a visualization of memory transfer sizes and throughput on the profiled architecture, as well as a guide for improving performance. Heatmaps allow users to intuitively understand potential bottlenecks and under-utilizations in the memory pipeline. Detailed tables for each hardware unit enable insight into the path from originating instruction to executed memory access.
Learn more about memory workload analysisCorrelate source code with detailed instruction metrics.
Nsight Compute supports correlating efficiency metrics down to the individual lines of code that contribute to them. This includes connecting assembly (SASS) with PTX and higher-level code, such as CUDA C/C++, Fortran, OpenACC or python. A heat-map visualization highlights areas with high metric values to quickly locate problematic areas. Warp stall sampling identifies latency and inefficiency issues while instruction execution metrics indicate expensive code locations. Such detail empowers the scrutinous eye to tune performance at a precise degree.
Metrics corresponding to individual lines of code being profiled in the source page.
A CUDA graph visualizing how nodes are configured and connected.
Utilize CUDA graphs and interactive profiling.
Interactive profiling creates a live session where application state can be viewed dynamically and full control of the target is preserved. This allows you to step API calls, inspect resources, or experiment with different kernel configurations to readily make performance comparisons. Explore and export CUDA graphs to understand how they are connected and profile individual nodes or the entire graph with detailed hardware metrics.
Uplift OptiX development.
Nsight Compute features a versatile built-in acceleration structure viewer for analyzing applications using NVIDIA’s RT Cores via the OptiX API. Viewing scene geometries, intersections and heatmaps enables quick identification of unnecessary overlap that could cause performance issues, or makes traversal operations behave differently from your expectations.
Acceleration structure viewer where with a hierarchical view on the left, a graphical view in the middle, and control options on the right.
View Other Tools Within the Nsight Suite
Nsight Compute is part of the NVIDIA Nsight Developer Tools suite; a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware.
Nsight Graphics
NVIDIA Nsight™ Graphics is a standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK.
Nsight Deep Learning Designer
Nsight DL Designer is an integrated development environment that helps developers efficiently design and develop deep neural networks for in-app inference.
Nsight Systems
NVIDIA Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs, from large servers to our smallest system on a chip (SoC).