Get Started With Nsight Systems
Download NVIDIA Nsight Systems
Nsight Systems 2025.2.1 is Available Now
Review the supported platforms for NVIDIA
Nsight™ Systems to choose the correct version for your host and profiling target.
If profiling from the CLI, pick your platform based on where the CLI will be run. If using the GUI (Full Version) to view reports, do profiling, or do remote profiling, pick your platform based on the host PC architecture where the GUI will be run.
Also review the system requirements before
downloading.
Desktop, workstation, and server platforms:
This download is for local and remote profiling of Windows and Linux servers, workstations, and gaming PCs. Profiling is supported on x86-64 architectures.
See the supported platforms for specifics about combinations of local, remote, and mixed-OS compatibilities.
Download:
This download is for local and remote profiling of Windows and Linux servers, workstations, and gaming PCs. Profiling is supported on x86-64 architectures.
See the supported platforms for specifics about combinations of local, remote, and mixed-OS compatibilities.
Nsight Systems 2025.2.1 Full Version
Nsight Systems 2025.2.1 CLI Only
Nsight Systems 2025.2.1 Arm Servers and NVIDIA Grace Full Version
Nsight Systems 2025.2.1 Arm Servers and NVIDIA Grace CLI Only
Download Nsight Systems 2025.2.1 macOS Host
This platform only supports viewing reports collected from a CLI or remotely profiling Linux laptops, desktops, workstations, and servers.
See the supported platforms.
Kubernetes integration:
The Nsight Tools Sidecar Injector enables your containerized applications to be profiled by NVIDIA Nsight applications (currently, only using Nsight Systems). This solution uses a Kubernetes dynamic admission controller to automatically add the following to your Pod: an init container, volumes containing Nsight Systems, its configurations, environment variables, and security context.
JupyterLab integration:
The Nsight Tools JupyterLab Extension allows you to profile cells and notebooks in Jupyter, including detailed analysis with the full Nsight Systems GUI.
Embedded and automotive platforms:
Nsight Systems is bundled as part of the Jetson development suite in the NVIDIA Jetpack™ SDK.
Nsight Systems is bundled as part of DRIVE OS for development and deployment on NVIDIA DRIVE AGX™-based autonomous vehicles.
View Nsight Systems documentation.
Supported Platforms
Nsight Systems is distributed through multiple packages. Pick a “Profiling Target” column and learn what hosts may be used to profile (local or remote) as well as view reports.
Profiling Target | ||||||
---|---|---|---|---|---|---|
Linux Workstations & Servers | Windows Workstations & Gaming PCs | NVIDIA DPUs & SuperNICs | Jetson & IGX | DRIVE | ||
From Host | ||||||
Windows | Remote GUI* Report Viewer** |
Local CLI & GUI Remote GUI* Report Viewer** |
Remote CLI Remote Report Viewer | Remote Report Viewer | Remote Report Viewer | |
Mac | Remote GUI* Report Viewer** |
Remote Report Viewer** | Remote Report Viewer | Remote Report Viewer | Remote Report Viewer | |
Linux | Local CLI & GUI Remote GUI* Report Viewer** |
Remote GUI Report Viewer** | Remote CLI Remote Report Viewer | Remote GUI Report Viewer*** | Remote GUI Report Viewer*** | |
DPU / SuperNIC | N/A | N/A | Local CLI | N/A | N/A | |
Jetson | N/A | N/A | N/A | Local CLI & GUI Report Viewer*** | N/A | |
DRIVE | N/A | N/A | N/A | N/A | Local CLI |
* For x86-64 targets only or opening report collected from a CLI
** Only for reports collected from Windows or Linux PCs & servers of equal or lesser versions
*** Only for reports collected from Jetson or DRIVEOS of equal or lesser versions
System Requirements
Nsight Systems is compatible on Windows workstations and PCs, Linux workstations and servers, as well as Jetson and NVIDIA DRIVE Autonomous Machines. Learn about the system requirements and support for your development platform below.
|
|
|
|
|
---|---|---|---|---|
Operating Systems | Windows 10 or newer |
|
|
Jetson Linux DRIVE OS |
Target Hardware | GPU: Pascal or newer CPU: x86-64 processors |
GPU: Pascal or newer CPU: x86-64 processors** |
GPU: Pascal or newer Arm-SBSA servers |
NVIDIA IGX, Jetson AGX Orin, Jetson AGX Xavier, Jetson TX2, Jetson TX1, DRIVE AGX Orin, DRIVE AGX Pegasus, DRIVE AGX Xavier, DRIVE PX Parker AutoChauffeur, DRIVE PX Parker AutoCruise |
Target Software | 64-bit applications only CUDA 10.0+ for CUDA trace Driver 418 or newer*** |
64-bit applications only CUDA 10.0+ for CUDA trace Driver 418 or newer*** |
64-bit applications only CUDA 10.0+ for CUDA trace Driver 418 or newer*** |
|
Local Profiling | CLI and GUI | CLI and GUI | CLI and GUI | CLI (all platforms), GUI (Jetson Linux only) |
Remote Profiling From Platforms |
Windows 10+ macOS 11+ Ubuntu 20.04+ |
Windows 10+ macOS 11+ Ubuntu 20.04+ |
N/A | Ubuntu 22.04 |
* For older OS versions, please use Nsight Systems 2020.3
** Intel Haswell architecture or newer is required for LBR sampling backtrace
*** Driver 535 and newer improves GPU profiling stability. Please use the latest driver for the best results. Download
here.
Release Notes
2025.2.1
- Highlights:
- Dask API trace
- PyTorch enhancements
- Python 3.13 support
- CUDA trace enhancements
- (Beta) Hardware based low-overhead CUDA trace for NVIDIA Blackwell (--trace=cuda-hw)
- GPU Direct Storage trace (--trace=gdc)
- CUDA device side event trace (--cuda-event-trace)
- Graph trace improvements
- Kernel CGA dimensions & policy
- Stream priority in tooltips
- NVIDIA Confidential Compute support improvements
- Windows graphics trace enhancements
- GPU Frame Duration for DLSS Frame Generation
- GPU resource trace tracks pre-start allocation names
- Graphics Hotspot Analysis recipe
- Linux system trace enhancements
- Syscall trace enhancements (requires CAP_BPF and CAP_PERFMON)
- Support system-wide mode (--syscall=pid-namespace)
- Collect backtraces
- OS Runtime Trace (OSRT) VFS POSIX functions trace (--osrt-file-access=true)
- Syscall trace enhancements (requires CAP_BPF and CAP_PERFMON)
- NVIDIA Grace support enhancements
- Topdown analysis recipe for PMU events based on NVTX range annotations
- Updates to available counters & metrics
- NVIDIA Tools Extensions(NVTX) API & support enhancements
- Various bug fixes
- Payloads Extensions
- Counters Extensions
- Deferred Events Extensions
- Updates bundled within NVIDIA Nsight Systems and github
- NVIDIA Nsight Systems Plugins
- Callback for last-chance to submit NVTX deferred events on stop
- Windows support
- GUI improvements
- macOS GUI now available for arm64
- Go to range — a timeline toolbar to quickly jump to the longest, shortest, and median ranges
- NVIDIA Nsight Streamer is now available on NGC for viewing reports on remote headless servers
- NVIDIA Nsight Operator releasing soon on NGC for Kubernetes
- Learn more here and apply for early access features
2025.1.1
- Highlights:
- CUDA 12.8 support
- Keep last N seconds CLI option - to retain the most relevant data when trying to record an unpredictable event, use --keep=N option with the ‘nsys stop’ command
- Recipes
- Summary of GPU metrics samples per range (NVTX or CUDA kernels)
- Pytorch
- Command-line option for enabling pytorch autograd layer NVTX ranges
- Command-line option to trace prominent pytorch python functions
- Windows
- graphics resource tracker now includes resource priority changes
- Reduced memory overhead when generating reports that contain ETW data
- Python 3.12 support for scripts & recipes
- Linux self-unpacking .run installer is now available for Arm64 SBSA systems
- Preview: Multi-pass script to run all CPU counters involved in Arm Top-Down analysis
2024.7.1
- Highlights:
- Storage profiling functionality expanded
- Lustre and NFS volumes performance metrics on the client
- Local disks and NVMe-oF volumes performance metrics on the client
- Recipe for storage utilization per volume:
- Cumulative throughput line graphs
- Throughput heatmaps
- Statistical analysis of MPI communication parameters
- Installer package variant added for NVIDIA Grace - .run
- Option to keep only last N seconds when generating a report file
- Option to export a subsection of the timeline into a new report file
- Enable "Resolve Symbols when creating report" by default on Windows and resolve only the symbols of the most relevant process
- Storage profiling functionality expanded
2024.6.1
- Highlights:
- DirectX12 memory resource trace improvements:
- New chart displays the distribution of DirectX12 resource types allocated by the target process in VRAM, across the trace session.
- Storage profiling (beta)
- Lustre & NFS - Performance metrics and operation counters for mounted volumes. (beta)
- NVMe-oF (NVMe over fiber) - Performance metrics from mounted volumes. (beta)
- Local storage - Performance metrics from local devices (beta)
- Example use-cases
- Verify GPU compute & storage transfer concurrency
- Identify GPU compute idle due to a storage transfer dependencies
- DirectX12 memory resource trace improvements:
2024.5.1
- Highlights:
- GPU metrics enhancements
- Compute triage for NVIDIA H100 (preview)
- Sync and async copy engine activity
- Python call-stack sample statistics tables
- OS System-Call trace (beta)
- GPU power metrics sampler (preview)
- NFS metrics sampler (beta)
- Net Interface metrics sampler and plugin example code (beta)
- CUDA trace support for devices with attribute cudaDevAttrD3D12CigSupported
- New analysis recipes
- NVIDIA NIC traffic statistics
- MPI communication parameters statistics
- Windows memory resources trace improvements
- System-wide committed VRAM timeline chart
- Additional memory transfer states: pending and in progress
- Vulkan device memory object names
- Vulkan bound resources names
- Quality of life improvements
- Windows symbols resolver now faster & limited to relevant processes
- Windows ETL import now includes page fault events
- Graphics frame health tooltips add thread IDs of API calls
- GPU metrics enhancements
- Additional Details:
- GPU metrics enhancements
-
A new metrics-set is introduced for NVIDIA H100, focused on compute triage. This new set contains far more metrics. As such it must be collected at lower speeds or for a shorter duration otherwise the buffers will overflow.
Copy engines are also exposed in the general metrics-set to better understand GPU activity for some architectures such as NVIDIA Ada Architecture. Synchronous copy engines are used in the graphics command sequences. Async copy engines are used in both compute and graphics to copy resources (typically) in the background.
- Python call-stack sample statistics tables
-
Initial Python call-stack sampling support was only presented on the timeline as markers. Historically call-stack samples of C, C++, ELF binaries additionally were presented in function statistics tables to explore the frequency of function calls and stack paths. These same views are now available for Python as well.
- OS System-Call trace (beta))
-
An alternative to OS RunTime API trace (OSRT) which is system-wide. OS runtime APIs are typically implemented with system calls (syscall instructions). System call trace collects syscall events from kernel space. Sudo or CAP_SYS_ADMIN is typically required.
- GPU power metrics sampler (preview)
-
Low frequency sampling of GPU power and temperatures, for users interested in understanding when clocks are adjusted by temperature or optimizing for power.
- NFS metrics sampler (beta)
-
See how network file system activity may relate to idle GPU time. Accessing data over the network just-in-time can cause delays vs prefetching it.
- Net Interface metrics sampler and plugin example code (beta)
-
Adding upon existing networking support for NVIDIA NIC and AWS EFAs, this samples the throughputs on linux net interfaces. These can be physical devices or virtual adapters for mechanisms such as containers and virtual machines. IO patterns are sometimes on the critical path of computation instead of being asynchronous, or not started early enough to avoid blocking the critical path.
Feature Table
Feature | Linux Workstations and Servers | Windows Workstations and Gaming PCs | Jetson Autonomous Machines | DRIVE Autonomous Vehicles |
---|---|---|---|---|
View system-wide application behavior across CPUs and GPUs | ||||
CPU cores utilization, process, & thread activities | yes | yes | yes | yes |
CPU thread periodic sampling backtraces | yes* | yes | yes | yes |
CPU thread blocked state backtraces | yes** | yes | yes | yes |
CPU performance metrics | yes | no | yes | yes |
GPU workload trace | yes | yes | yes | yes |
GPU context switch trace | yes | yes | yes | yes |
SOC hypervisor trace | - | - | - | yes |
SOC memory bandwidth sampling | - | - | yes | yes |
SOC Accelerators trace | - | - | Xavier+ | Xavier+ |
OS Event Trace | ftrace | ETW | ftrace | ftrace, QNX kernel events |
Investigate CPU-GPU interactions and bubbles | ||||
User annotations API trace
NVIDIA Tools Extension API (NVTX) |
yes | yes | yes | yes |
CUDA API | yes | yes | yes | yes |
CUDA libraries trace (cuBLAS, cuDNN & TensorRT) | yes | no | yes | yes |
OpenGL API trace | yes | yes | yes | yes |
Vulkan API trace | yes | yes | no | no |
Direct3D12, Direct3D11, DXR, & PIX APIs | - | yes | - | - |
OpenXR | - | yes | - | - |
OptiX | 7.1+ | 7.1+ | - | - |
Bidirectional correlation of API and GPU workload | yes | yes | yes | yes |
Identify GPU idle and sparse usage | yes | yes | yes | yes |
Multi-GPU Graphics trace | OpenGL and Vulkan | Direct3D12, OpenGL and Vulkan | - | - |
Trace graphics resource migration between VRAM and System Memory | - | yes | - | - |
Ready for big data | ||||
Fast GUI capable of visualizing in excess of 10 million events on laptops | yes | yes | yes | yes |
Additional command line collection tool | yes | no | no | no |
NV-Docker container support | yes | - | - | - |
NVIDIA GPU Cloud support | yes | - | - | - |
Minimum user privilege level | user | administrator | root | root |
* On Intel Haswell and newer CPU architecture
** Only with OS runtime trace enabled. Some syscalls such as handcrafted assembly may be missed. Backtraces may only appear if time threasholds are exceeded.
Archives
Access older versions of Nsight Systems in the Gameworks Download Center.
View older version release notes in the Nsight System’s
documentation archive.
Resources
Nsight Systems Documentation
You can also learn about installing & using the NVIDIA Tools Extension API (NVTX) here.
Nsight Tools Tutorial Center
Access the latest resources to get started with Nsight Systems.
Access Self-Paced Training
Nsight Systems Documentation
Get hands on training for Nsight Systems with self-paced online courses from the NVIDIA Deep Learning
Institute.
See more courses on Accelerated
Computing for Developers.
Fundamentals of Accelerated Computing with OpenACC
Learn how to profile applications to identify optimization needs, and more ways to accelerate C/C++ or Fortran applications with OpenACC.
Accelerating CUDA C++ Applications with Concurrent Streams
Build robust and efficient CUDA C++ applications that can leverage copy/compute overlap for significant performance gains.
Scaling Workloads Across Multiple GPUs with CUDA C++
Developer robust and efficient CUDA C++ applications that can leverage all available GPUs on a single node.
Optimizing CUDA Machine Learning Codes With Nsight Profiling Tools
Use Nsight Systems to analyze overall application structure Nsight Compute to analyze and optimize individual CUDA kernels.
Tutorial Sessions

Profiling GPU Applications with Nsight Systems
This webinar gives an overview of NVIDIA's Nsight profiling tools. It explores how to analyze and optimize the performance of GPU-accelerated applications.

Fundamentals of Ray Tracing Development using Nsight Graphics and Nsight Systems
Learn how to utilize Nsight Graphics and Nsight Systems to profile and optimize 3D Applications that are using Ray Tracing.
Investigating Hidden Bottlenecks for Multi-Node Workloads
Learn how Nsight Systems can help users identify bottlenecks, investigate their causes, and support developers working at multi-GPU multi-node scales.
Optimizing Communication with Nsight Systems Network Profiling
Learn how to use Nsight Systems' network profiling capabilities and see how real-world applications utilize GPUs, CPUs, and networking hardware.
Overcoming Pre- and Post-Processing Bottlenecks in AI Imaging and CV Pipelines with CV-CUDA
Watch how Nsight Systems can be used to analyze performance markers and find optimization opportunities for cloud-scale AI.

Optimizing HPC simulation and visualization code using NVIDIA Nsight systems
The NIH Center for Macromolecular Modeling and Bioinformatics used Nsight Systems to achieve a 3x performance increase analyzing large biomolecular systems.
Video Series
Learn about using Nsight Systems for CUDA Development in the CUDA Developer Tools tutorial series.

CUDA Developer Tools | NVIDIA Nsight Tools Ecosystem

CUDA Developer Tools | Introduction to Nsight Systems

CUDA Developer Tools | Introduction to Nsight Systems
Optimizing CUDA Memory Allocations Using NVIDIA Nsight Systems

Nsight Systems Command Line Feature Spotlight

Analyzing NCCL Usage with NVIDIA Nsight Systems
Nsight Systems Feature Spotlight: OpenMP

Nsight Systems - Vulkan Trace
Support
To provide feedback, request additional features, or report support issues, please use the Developer Forums .