Get Started With Nsight Systems

Download NVIDIA Nsight Systems

Nsight Systems 2025.5.1 is Available Now

Review the supported platforms for NVIDIA Nsight™ Systems to choose the correct version for your host and profiling target.

If profiling from the CLI, pick your platform based on where the CLI will be run. If using the GUI (Full Version) to view reports, do profiling, or do remote profiling, pick your platform based on the host PC architecture where the GUI will be run.

Also review the system requirements before downloading.

Desktop, workstation, and server platforms:

Download for Windows on x86_64

This download is for local and remote profiling of Windows and Linux servers, workstations, and gaming PCs. Profiling is supported on x86-64 architectures.

See the supported platforms for specifics about combinations of local, remote, and mixed-OS compatibilities.

Download:

Download Nsight Systems 2025.5.1 Windows Full Version

Download for Linux on x86_64

This download is for local and remote profiling of Windows and Linux servers, workstations, and gaming PCs. Profiling is supported on x86-64 architectures.

See the supported platforms for specifics about combinations of local, remote, and mixed-OS compatibilities.

Nsight Systems 2025.5.1 Full Version

Nsight Systems 2025.5.1 CLI Only

Download for Linux on Arm Servers and NVIDIA Grace

Nsight Systems 2025.5.1 Arm Servers and NVIDIA Grace Full Version

Nsight Systems 2025.5.1 Arm Servers and NVIDIA Grace CLI Only

Download for macOS

Download Nsight Systems 2025.5.1 macOS Host

This platform only supports viewing reports collected from a CLI or remotely profiling Linux laptops, desktops, workstations, and servers.

See the supported platforms.

Kubernetes integration:

Download NVIDIA Nsight Tools Sidecar Injector

The Nsight Tools Sidecar Injector enables your containerized applications to be profiled by NVIDIA Nsight applications (currently, only using Nsight Systems). This solution uses a Kubernetes dynamic admission controller to automatically add the following to your Pod: an init container, volumes containing Nsight Systems, its configurations, environment variables, and security context.

Download NVIDIA Nsight Tools Sidecar Injector

JupyterLab integration:

Download NVIDIA Nsight Tools JupyterLab Extension

The Nsight Tools JupyterLab Extension allows you to profile cells and notebooks in Jupyter, including detailed analysis with the full Nsight Systems GUI.

Install Nsight Tools JupyterLab Extension (PyPI)

Embedded and automotive platforms:

Download for NVIDIA Jetson

Nsight Systems is bundled as part of the Jetson development suite in the NVIDIA Jetpack™ SDK.

Download for NVIDIA DRIVE

Nsight Systems is bundled as part of DRIVE OS for development and deployment on NVIDIA DRIVE AGX™-based autonomous vehicles.

View Nsight Systems documentation.

Supported Platforms

Nsight Systems is distributed through multiple packages. Pick a “Profiling Target” column and learn what hosts may be used to profile (local or remote) as well as view reports.

		Profiling Target
		Linux Workstations & Servers	Windows Workstations & Gaming PCs	NVIDIA DPUs & SuperNICs	Jetson & IGX	DRIVE
From Host
	Windows	Remote GUI* Report Viewer**	Local CLI & GUI Remote GUI* Report Viewer**	Remote CLI Remote Report Viewer	Remote Report Viewer	Remote Report Viewer
	Mac	Remote GUI* Report Viewer**	Remote Report Viewer**	Remote Report Viewer	Remote Report Viewer	Remote Report Viewer
	Linux	Local CLI & GUI Remote GUI* Report Viewer**	Remote GUI Report Viewer**	Remote CLI Remote Report Viewer	Remote GUI Report Viewer***	Remote GUI Report Viewer***
	DPU / SuperNIC	N/A	N/A	Local CLI	N/A	N/A
	Jetson	N/A	N/A	N/A	Local CLI & GUI Report Viewer***	N/A
	DRIVE	N/A	N/A	N/A	N/A	Local CLI

* For x86-64 targets only or opening report collected from a CLI

** Only for reports collected from Windows or Linux PCs & servers of equal or lesser versions

*** Only for reports collected from Jetson or DRIVEOS of equal or lesser versions

System Requirements

Nsight Systems is compatible on Windows workstations and PCs, Linux workstations and servers, as well as Jetson and NVIDIA DRIVE Autonomous Machines. Learn about the system requirements and support for your development platform below.

	Windows Workstations and Gaming PCs	Linux Workstations and Servers	Linux Arm Servers	Jetson and Drive Autonomous Machines
Operating Systems	Windows 10 or newer	Ubuntu 24.04, and 22.04 WSL-Ubuntu 2.0 CentOS 7+* RHEL 8, 9, 10 SLES 15 Debian 12 Fedora 41 KylinOS 10 OpenSUSE 15 Rocky 8, 9	Ubuntu 22.04, and 24.04 Rhel 8, 9 SLES 15	Jetson Linux DRIVE OS
Target Hardware	GPU: Pascal or newer CPU: x86-64 processors	GPU: Pascal or newer CPU: x86-64 processors**	GPU: Pascal or newer Arm-SBSA servers	NVIDIA IGX, Jetson AGX Orin, Jetson AGX Xavier, Jetson TX2, Jetson TX1, DRIVE AGX Orin, DRIVE AGX Pegasus, DRIVE AGX Xavier, DRIVE PX Parker AutoChauffeur, DRIVE PX Parker AutoCruise
Target Software	64-bit applications only CUDA 10.0+ for CUDA trace Driver 418 or newer***	64-bit applications only CUDA 10.0+ for CUDA trace Driver 418 or newer***	64-bit applications only CUDA 10.0+ for CUDA trace Driver 418 or newer***
Local Profiling	CLI and GUI	CLI and GUI	CLI and GUI	CLI (all platforms), GUI (Jetson Linux only)
Remote Profiling From Platforms	Windows 10+ macOS 11+ Ubuntu 20.04+	Windows 10+ macOS 11+ Ubuntu 20.04+	N/A	Ubuntu 22.04

* For older OS versions, please use Nsight Systems 2020.3
** Intel Haswell architecture or newer is required for LBR sampling backtrace
*** Driver 535 and newer improves GPU profiling stability. Please use the latest driver for the best results. Download here.

Release Notes

2025.5.1

Highlights:
Vulkan1.4 trace
- Nsight Systems’ Vulkan API trace has been extended to include significant additional functions from the Vulkan 1.4 standard and extensions set.
Graphics 2D Frame Duration charts
- The Frame Duration charts (CPU frames and GPU frames) now use two-dimensional visualization, expressing frame duration on the timeline by the horizontal length of the frame range as well as the frame rectangle’s vertical height.
- This chart design makes stutter frames “pop” above the surrounding frame duration ranges.
Linux File Access recipe
- A comprehensive analysis of file access patterns and I/O performance statistics from Nsight Systems reports, supporting single or multiple profiled processes across different machines.
NVIDIA Nsight Streamer improvements now available on NGC for viewing reports on remote headless servers
- Docker
- Kubernetes
NVIDIA Nsight Operator improvements releasing soon on NGC for Kubernetes
- Learn more here and apply for early access features

2025.4.1

Highlights:
- CUDA 13.0 support
- New --cuda-trace-all-apis CLI switch allows tracing all CUDA APIS, including those skipped by default
- NVTX projection support for cudaGraphExec*NodeSetParams() API
- CUDA Event meta info will be provided in SQLITE's CUPTI_ACTIVITY_KIND_CUDA_EVENT table even when device-side timestamp trace is disabled (--cuda-event-trace=false)
NVTX 3.3 support
Linux File Access Statistics Recipe
Vulkan GPU Workload trace on GeForce RTX 5050 GPUs
NVIDIA Nsight Streamer improvements now available on NGC for viewing reports on remote headless servers
- Docker
- Kubernetes
NVIDIA Nsight Operator improvements releasing soon on NGC for Kubernetes
- Learn more here and apply for early access features

2025.3.1

Highlights:
- CUDA 12.9 support
- NVIDIA Nsight Streamer improvements now available on NGC for viewing reports on remote headless servers
  - Docker
  - Kubernetes
- NVIDIA Nsight Operator improvements releasing soon on NGC for Kubernetes
  - Learn more here and apply for early access features
- Collect GPUDirect Storage metrics sampling

2025.2.1

Highlights:
- Dask API trace
- PyTorch enhancements
- Python 3.13 support
- CUDA trace enhancements
  - (Beta) Hardware based low-overhead CUDA trace for NVIDIA Blackwell (--trace=cuda-hw)
  - GPU Direct Storage trace (--trace=gdc)
  - CUDA device side event trace (--cuda-event-trace)
  - Graph trace improvements
  - Kernel CGA dimensions & policy
  - Stream priority in tooltips
  - NVIDIA Confidential Compute support improvements
- Windows graphics trace enhancements
  - GPU Frame Duration for DLSS Frame Generation
  - GPU resource trace tracks pre-start allocation names
  - Graphics Hotspot Analysis recipe
- Linux system trace enhancements
  - Syscall trace enhancements (requires CAP_BPF and CAP_PERFMON)
    - Support system-wide mode (--syscall=pid-namespace)
    - Collect backtraces
  - OS Runtime Trace (OSRT) VFS POSIX functions trace (--osrt-file-access=true)
- NVIDIA Grace support enhancements
  - Topdown analysis recipe for PMU events based on NVTX range annotations
  - Updates to available counters & metrics
- NVIDIA Tools Extensions(NVTX) API & support enhancements
  - Various bug fixes
  - Payloads Extensions
  - Counters Extensions
  - Deferred Events Extensions
  - Updates bundled within NVIDIA Nsight Systems and github
- NVIDIA Nsight Systems Plugins
  - Callback for last-chance to submit NVTX deferred events on stop
  - Windows support
- GUI improvements
  - macOS GUI now available for arm64
  - Go to range — a timeline toolbar to quickly jump to the longest, shortest, and median ranges
- NVIDIA Nsight Streamer is now available on NGC for viewing reports on remote headless servers
  - Docker
  - Kubernetes
- NVIDIA Nsight Operator releasing soon on NGC for Kubernetes
  - Learn more here and apply for early access features

2025.1.1

Highlights:
- CUDA 12.8 support
- Keep last N seconds CLI option - to retain the most relevant data when trying to record an unpredictable event, use --keep=N option with the ‘nsys stop’ command
- Recipes
  - Summary of GPU metrics samples per range (NVTX or CUDA kernels)
- Pytorch
  - Command-line option for enabling pytorch autograd layer NVTX ranges
  - Command-line option to trace prominent pytorch python functions
- Windows
  - graphics resource tracker now includes resource priority changes
  - Reduced memory overhead when generating reports that contain ETW data
- Python 3.12 support for scripts & recipes
- Linux self-unpacking .run installer is now available for Arm64 SBSA systems
- Preview: Multi-pass script to run all CPU counters involved in Arm Top-Down analysis

Feature Table

Feature	Linux Workstations and Servers	Windows Workstations and Gaming PCs	Jetson Autonomous Machines	DRIVE Autonomous Vehicles
View system-wide application behavior across CPUs and GPUs
CPU cores utilization, process, & thread activities	yes	yes	yes	yes
CPU thread periodic sampling backtraces	yes*	yes	yes	yes
CPU thread blocked state backtraces	yes**	yes	yes	yes
CPU performance metrics	yes	no	yes	yes
GPU workload trace	yes	yes	yes	yes
GPU context switch trace	yes	yes	yes	yes
SOC hypervisor trace	-	-	-	yes
SOC memory bandwidth sampling	-	-	yes	yes
SOC Accelerators trace	-	-	Xavier+	Xavier+
OS Event Trace	ftrace	ETW	ftrace	ftrace, QNX kernel events
Investigate CPU-GPU interactions and bubbles
User annotations API trace NVIDIA Tools Extension API (NVTX)	yes	yes	yes	yes
CUDA API	yes	yes	yes	yes
CUDA libraries trace (cuBLAS, cuDNN & TensorRT)	yes	no	yes	yes
OpenGL API trace	yes	yes	yes	yes
Vulkan API trace	yes	yes	no	no
Direct3D12, Direct3D11, DXR, & PIX APIs	-	yes	-	-
OpenXR	-	yes	-	-
OptiX	7.1+	7.1+	-	-
Bidirectional correlation of API and GPU workload	yes	yes	yes	yes
Identify GPU idle and sparse usage	yes	yes	yes	yes
Multi-GPU Graphics trace	OpenGL and Vulkan	Direct3D12, OpenGL and Vulkan	-	-
Trace graphics resource migration between VRAM and System Memory	-	yes	-	-
Ready for big data
Fast GUI capable of visualizing in excess of 10 million events on laptops	yes	yes	yes	yes
Additional command line collection tool	yes	no	no	no
NV-Docker container support	yes	-	-	-
NVIDIA GPU Cloud support	yes	-	-	-
Minimum user privilege level	user	administrator	root	root

* On Intel Haswell and newer CPU architecture
** Only with OS runtime trace enabled. Some syscalls such as handcrafted assembly may be missed. Backtraces may only appear if time threasholds are exceeded.

By Invitation only: Fundamentals of Accelerated Computing with CUDA C/C++

Learn how to optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

Learn More

Fundamentals of Accelerated Computing with OpenACC

Learn how to profile applications to identify optimization needs, and more ways to accelerate C/C++ or Fortran applications with OpenACC.

Learn More

Accelerating CUDA C++ Applications with Concurrent Streams

Build robust and efficient CUDA C++ applications that can leverage copy/compute overlap for significant performance gains.

Enroll Now

Scaling Workloads Across Multiple GPUs with CUDA C++

Developer robust and efficient CUDA C++ applications that can leverage all available GPUs on a single node.

Enroll Now

Optimizing CUDA Machine Learning Codes With Nsight Profiling Tools

Use Nsight Systems to analyze overall application structure Nsight Compute to analyze and optimize individual CUDA kernels.

Enroll Now

Tutorial Sessions

Profiling GPU Applications with Nsight Systems

This webinar gives an overview of NVIDIA's Nsight profiling tools. It explores how to analyze and optimize the performance of GPU-accelerated applications.

Watch (54:52)

Fundamentals of Ray Tracing Development using Nsight Graphics and Nsight Systems

Learn how to utilize Nsight Graphics and Nsight Systems to profile and optimize 3D Applications that are using Ray Tracing.

Watch (2:04:45)

Watch video about optimizing communication with Nsight Systems network profiling

Investigating Hidden Bottlenecks for Multi-Node Workloads

Learn how Nsight Systems can help users identify bottlenecks, investigate their causes, and support developers working at multi-GPU multi-node scales.

Watch (47:21)

Optimizing Communication with Nsight Systems Network Profiling

Learn how to use Nsight Systems' network profiling capabilities and see how real-world applications utilize GPUs, CPUs, and networking hardware.

Watch (41:45)

Overcoming Pre- and Post-Processing Bottlenecks in AI Imaging and CV Pipelines with CV-CUDA

Watch how Nsight Systems can be used to analyze performance markers and find optimization opportunities for cloud-scale AI.

Watch (42:47)

Optimizing HPC simulation and visualization code using NVIDIA Nsight systems

The NIH Center for Macromolecular Modeling and Bioinformatics used Nsight Systems to achieve a 3x performance increase analyzing large biomolecular systems.

Watch (40:57)

Video Series

Learn about using Nsight Systems for CUDA Development in the CUDA Developer Tools tutorial series.