Get Started With Nsight Systems

Download NVIDIA Nsight Systems

Nsight Systems 2025.2.1 is Available Now

Review the supported platforms for NVIDIA Nsight™ Systems to choose the correct version for your host and profiling target.

If profiling from the CLI, pick your platform based on where the CLI will be run. If using the GUI (Full Version) to view reports, do profiling, or do remote profiling, pick your platform based on the host PC architecture where the GUI will be run.

Also review the system requirements before downloading.


Desktop, workstation, and server platforms:



This download is for local and remote profiling of Windows and Linux servers, workstations, and gaming PCs. Profiling is supported on x86-64 architectures.


See the supported platforms for specifics about combinations of local, remote, and mixed-OS compatibilities.


Download:



This download is for local and remote profiling of Windows and Linux servers, workstations, and gaming PCs. Profiling is supported on x86-64 architectures.


See the supported platforms for specifics about combinations of local, remote, and mixed-OS compatibilities.


Nsight Systems 2025.2.1 Full Version

Nsight Systems 2025.2.1 CLI Only



Nsight Systems 2025.2.1 Arm Servers and NVIDIA Grace Full Version

Nsight Systems 2025.2.1 Arm Servers and NVIDIA Grace CLI Only



Download Nsight Systems 2025.2.1 macOS Host

This platform only supports viewing reports collected from a CLI or remotely profiling Linux laptops, desktops, workstations, and servers.


See the supported platforms.


Kubernetes integration:



The Nsight Tools Sidecar Injector enables your containerized applications to be profiled by NVIDIA Nsight applications (currently, only using Nsight Systems). This solution uses a Kubernetes dynamic admission controller to automatically add the following to your Pod: an init container, volumes containing Nsight Systems, its configurations, environment variables, and security context.



JupyterLab integration:



The Nsight Tools JupyterLab Extension allows you to profile cells and notebooks in Jupyter, including detailed analysis with the full Nsight Systems GUI.



Embedded and automotive platforms:



Nsight Systems is bundled as part of the Jetson development suite in the NVIDIA Jetpack™ SDK.



Nsight Systems is bundled as part of DRIVE OS for development and deployment on NVIDIA DRIVE AGX™-based autonomous vehicles.


View Nsight Systems documentation.




Supported Platforms


Nsight Systems is distributed through multiple packages. Pick a “Profiling Target” column and learn what hosts may be used to profile (local or remote) as well as view reports.


Profiling Target
Linux Workstations & Servers Windows Workstations & Gaming PCs NVIDIA DPUs & SuperNICs Jetson & IGX DRIVE
From Host
Windows Remote GUI*
Report Viewer**
Local CLI & GUI Remote GUI*
Report Viewer**
Remote CLI Remote Report Viewer Remote Report Viewer Remote Report Viewer
Mac Remote GUI*
Report Viewer**
Remote Report Viewer** Remote Report Viewer Remote Report Viewer Remote Report Viewer
Linux Local CLI & GUI
Remote GUI*
Report Viewer**
Remote GUI Report Viewer** Remote CLI Remote Report Viewer Remote GUI Report Viewer*** Remote GUI Report Viewer***
DPU / SuperNIC N/A N/A Local CLI N/A N/A
Jetson N/A N/A N/A Local CLI & GUI Report Viewer*** N/A
DRIVE N/A N/A N/A N/A Local CLI

* For x86-64 targets only or opening report collected from a CLI

** Only for reports collected from Windows or Linux PCs & servers of equal or lesser versions

*** Only for reports collected from Jetson or DRIVEOS of equal or lesser versions




System Requirements


Nsight Systems is compatible on Windows workstations and PCs, Linux workstations and servers, as well as Jetson and NVIDIA DRIVE Autonomous Machines. Learn about the system requirements and support for your development platform below.


Windows Workstations and Gaming PCs
Linux Workstations and Servers
Linux Arm Servers
Jetson and Drive Autonomous Machines
Operating Systems Windows 10 or newer
  • Ubuntu 24.04, 22.04, and 20.04*
  • WSL-Ubuntu 2.0
  • CentOS 7+*
  • RHEL 7, 8, 9
  • SLES 15
  • Debian 10, 11, 12
  • Fedora 37
  • KylinOS 10
  • OpenSUSE 15
  • Rocky 8, 9
  • Ubuntu, 20.04*, and 22.04
  • Rhel 8, 9
  • SLES 15
Jetson Linux
DRIVE OS
Target Hardware GPU: Pascal or newer
CPU: x86-64 processors
GPU: Pascal or newer
CPU: x86-64 processors**
GPU: Pascal or newer
Arm-SBSA servers
NVIDIA IGX, Jetson AGX Orin, Jetson AGX Xavier, Jetson TX2, Jetson TX1, DRIVE AGX Orin, DRIVE AGX Pegasus, DRIVE AGX Xavier, DRIVE PX Parker AutoChauffeur, DRIVE PX Parker AutoCruise
Target Software 64-bit applications only
CUDA 10.0+ for CUDA trace
Driver 418 or newer***
64-bit applications only
CUDA 10.0+ for CUDA trace Driver 418 or newer***
64-bit applications only
CUDA 10.0+ for CUDA trace Driver 418 or newer***
Local Profiling CLI and GUI CLI and GUI CLI and GUI CLI (all platforms), GUI (Jetson Linux only)
Remote Profiling
From Platforms
Windows 10+
macOS 11+
Ubuntu 20.04+
Windows 10+
macOS 11+
Ubuntu 20.04+
N/A Ubuntu 22.04

* For older OS versions, please use Nsight Systems 2020.3
** Intel Haswell architecture or newer is required for LBR sampling backtrace
*** Driver 535 and newer improves GPU profiling stability. Please use the latest driver for the best results. Download here.






Release Notes


2025.2.1

  • Highlights:
    • Dask API trace
    • PyTorch enhancements
    • Python 3.13 support
    • CUDA trace enhancements
      • (Beta) Hardware based low-overhead CUDA trace for NVIDIA Blackwell (--trace=cuda-hw)
      • GPU Direct Storage trace (--trace=gdc)
      • CUDA device side event trace (--cuda-event-trace)
      • Graph trace improvements
      • Kernel CGA dimensions & policy
      • Stream priority in tooltips
      • NVIDIA Confidential Compute support improvements
    • Windows graphics trace enhancements
      • GPU Frame Duration for DLSS Frame Generation
      • GPU resource trace tracks pre-start allocation names
      • Graphics Hotspot Analysis recipe
    • Linux system trace enhancements
      • Syscall trace enhancements (requires CAP_BPF and CAP_PERFMON)
        • Support system-wide mode (--syscall=pid-namespace)
        • Collect backtraces
      • OS Runtime Trace (OSRT) VFS POSIX functions trace (--osrt-file-access=true)
    • NVIDIA Grace support enhancements
      • Topdown analysis recipe for PMU events based on NVTX range annotations
      • Updates to available counters & metrics
    • NVIDIA Tools Extensions(NVTX) API & support enhancements
      • Various bug fixes
      • Payloads Extensions
      • Counters Extensions
      • Deferred Events Extensions
      • Updates bundled within NVIDIA Nsight Systems and github
    • NVIDIA Nsight Systems Plugins
      • Callback for last-chance to submit NVTX deferred events on stop
      • Windows support
    • GUI improvements
      • macOS GUI now available for arm64
      • Go to range — a timeline toolbar to quickly jump to the longest, shortest, and median ranges
    • NVIDIA Nsight Streamer is now available on NGC for viewing reports on remote headless servers
    • NVIDIA Nsight Operator releasing soon on NGC for Kubernetes
      • Learn more here and apply for early access features


2025.1.1

  • Highlights:
    • CUDA 12.8 support
    • Keep last N seconds CLI option - to retain the most relevant data when trying to record an unpredictable event, use --keep=N option with the ‘nsys stop’ command
    • Recipes
      • Summary of GPU metrics samples per range (NVTX or CUDA kernels)
    • Pytorch
      • Command-line option for enabling pytorch autograd layer NVTX ranges
      • Command-line option to trace prominent pytorch python functions
    • Windows
      • graphics resource tracker now includes resource priority changes
      • Reduced memory overhead when generating reports that contain ETW data
    • Python 3.12 support for scripts & recipes
    • Linux self-unpacking .run installer is now available for Arm64 SBSA systems
    • Preview: Multi-pass script to run all CPU counters involved in Arm Top-Down analysis


2024.7.1

  • Highlights:
    • Storage profiling functionality expanded
      • Lustre and NFS volumes performance metrics on the client
      • Local disks and NVMe-oF volumes performance metrics on the client
      • Recipe for storage utilization per volume:
        • Cumulative throughput line graphs
        • Throughput heatmaps
      • Statistical analysis of MPI communication parameters
      • Installer package variant added for NVIDIA Grace - .run
      • Option to keep only last N seconds when generating a report file
      • Option to export a subsection of the timeline into a new report file
      • Enable "Resolve Symbols when creating report" by default on Windows and resolve only the symbols of the most relevant process


2024.6.1

  • Highlights:
    • DirectX12 memory resource trace improvements:
      • New chart displays the distribution of DirectX12 resource types allocated by the target process in VRAM, across the trace session.
    • Storage profiling (beta)
      • Lustre & NFS - Performance metrics and operation counters for mounted volumes. (beta)
      • NVMe-oF (NVMe over fiber) - Performance metrics from mounted volumes. (beta)
      • Local storage - Performance metrics from local devices (beta)
      • Example use-cases
        • Verify GPU compute & storage transfer concurrency
        • Identify GPU compute idle due to a storage transfer dependencies


2024.5.1

  • Highlights:
    • GPU metrics enhancements
      • Compute triage for NVIDIA H100 (preview)
      • Sync and async copy engine activity
    • Python call-stack sample statistics tables
    • OS System-Call trace (beta)
    • GPU power metrics sampler (preview)
    • NFS metrics sampler (beta)
    • Net Interface metrics sampler and plugin example code (beta)
    • CUDA trace support for devices with attribute cudaDevAttrD3D12CigSupported
    • New analysis recipes
      • NVIDIA NIC traffic statistics
      • MPI communication parameters statistics
    • Windows memory resources trace improvements
      • System-wide committed VRAM timeline chart
      • Additional memory transfer states: pending and in progress
      • Vulkan device memory object names
      • Vulkan bound resources names
    • Quality of life improvements
      • Windows symbols resolver now faster & limited to relevant processes
      • Windows ETL import now includes page fault events
      • Graphics frame health tooltips add thread IDs of API calls
  • Additional Details:
  • GPU metrics enhancements
      A new metrics-set is introduced for NVIDIA H100, focused on compute triage. This new set contains far more metrics. As such it must be collected at lower speeds or for a shorter duration otherwise the buffers will overflow. Copy engines are also exposed in the general metrics-set to better understand GPU activity for some architectures such as NVIDIA Ada Architecture. Synchronous copy engines are used in the graphics command sequences. Async copy engines are used in both compute and graphics to copy resources (typically) in the background.
  • Python call-stack sample statistics tables
      Initial Python call-stack sampling support was only presented on the timeline as markers. Historically call-stack samples of C, C++, ELF binaries additionally were presented in function statistics tables to explore the frequency of function calls and stack paths. These same views are now available for Python as well.
  • OS System-Call trace (beta))
      An alternative to OS RunTime API trace (OSRT) which is system-wide. OS runtime APIs are typically implemented with system calls (syscall instructions). System call trace collects syscall events from kernel space. Sudo or CAP_SYS_ADMIN is typically required.
  • GPU power metrics sampler (preview)
      Low frequency sampling of GPU power and temperatures, for users interested in understanding when clocks are adjusted by temperature or optimizing for power.
  • NFS metrics sampler (beta)
      See how network file system activity may relate to idle GPU time. Accessing data over the network just-in-time can cause delays vs prefetching it.
  • Net Interface metrics sampler and plugin example code (beta)
      Adding upon existing networking support for NVIDIA NIC and AWS EFAs, this samples the throughputs on linux net interfaces. These can be physical devices or virtual adapters for mechanisms such as containers and virtual machines. IO patterns are sometimes on the critical path of computation instead of being asynchronous, or not started early enough to avoid blocking the critical path.








Feature Table

Feature Linux Workstations and Servers Windows Workstations and Gaming PCs Jetson Autonomous Machines DRIVE Autonomous Vehicles
View system-wide application behavior across CPUs and GPUs
CPU cores utilization, process, & thread activities yes yes yes yes
CPU thread periodic sampling backtraces yes* yes yes yes
CPU thread blocked state backtraces yes** yes yes yes
CPU performance metrics yes no yes yes
GPU workload trace yes yes yes yes
GPU context switch trace yes yes yes yes
SOC hypervisor trace - - - yes
SOC memory bandwidth sampling - - yes yes
SOC Accelerators trace - - Xavier+ Xavier+
OS Event Trace ftrace ETW ftrace ftrace, QNX kernel events
Investigate CPU-GPU interactions and bubbles
User annotations API trace

NVIDIA Tools Extension API (NVTX)
yes yes yes yes
CUDA API yes yes yes yes
CUDA libraries trace (cuBLAS, cuDNN & TensorRT) yes no yes yes
OpenGL API trace yes yes yes yes
Vulkan API trace yes yes no no
Direct3D12, Direct3D11, DXR, & PIX APIs - yes - -
OpenXR - yes - -
OptiX 7.1+ 7.1+ - -
Bidirectional correlation of API and GPU workload yes yes yes yes
Identify GPU idle and sparse usage yes yes yes yes
Multi-GPU Graphics trace OpenGL and Vulkan Direct3D12, OpenGL and Vulkan - -
Trace graphics resource migration between VRAM and System Memory - yes - -
Ready for big data
Fast GUI capable of visualizing in excess of 10 million events on laptops yes yes yes yes
Additional command line collection tool yes no no no
NV-Docker container support yes - - -
NVIDIA GPU Cloud support yes - - -
Minimum user privilege level user administrator root root

* On Intel Haswell and newer CPU architecture
** Only with OS runtime trace enabled. Some syscalls such as handcrafted assembly may be missed. Backtraces may only appear if time threasholds are exceeded.






Archives

Access older versions of Nsight Systems in the Gameworks Download Center.
View older version release notes in the Nsight System’s documentation archive.






Resources

Nsight Systems Documentation

You can also learn about installing & using the NVIDIA Tools Extension API (NVTX) here.

Nsight Tools Tutorial Center

Access the latest resources to get started with Nsight Systems.




Access Self-Paced Training

Nsight Systems Documentation

Get hands on training for Nsight Systems with self-paced online courses from the NVIDIA Deep Learning Institute.

See more courses on Accelerated Computing for Developers.

By Invitation only: Fundamentals of Accelerated Computing with CUDA C/C++

Learn how to optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

Learn More

Fundamentals of Accelerated Computing with OpenACC

Learn how to profile applications to identify optimization needs, and more ways to accelerate C/C++ or Fortran applications with OpenACC.

Learn More

Accelerating CUDA C++ Applications with Concurrent Streams

Build robust and efficient CUDA C++ applications that can leverage copy/compute overlap for significant performance gains.

Enroll Now

Scaling Workloads Across Multiple GPUs with CUDA C++

Developer robust and efficient CUDA C++ applications that can leverage all available GPUs on a single node.

Enroll Now

Optimizing CUDA Machine Learning Codes With Nsight Profiling Tools

Use Nsight Systems to analyze overall application structure Nsight Compute to analyze and optimize individual CUDA kernels.

Enroll Now





Tutorial Sessions

 Watch video about profiling GPU applications with Nsight Systems

Profiling GPU Applications with Nsight Systems

This webinar gives an overview of NVIDIA's Nsight profiling tools. It explores how to analyze and optimize the performance of GPU-accelerated applications.

Watch (54:52)
Watch video about fundamentals of ray-tracing development using Nsight graphics and Nsight Systems

Fundamentals of Ray Tracing Development using Nsight Graphics and Nsight Systems

Learn how to utilize Nsight Graphics and Nsight Systems to profile and optimize 3D Applications that are using Ray Tracing.

Watch (2:04:45)
Watch video about optimizing communication with Nsight Systems network profiling

Investigating Hidden Bottlenecks for Multi-Node Workloads

Learn how Nsight Systems can help users identify bottlenecks, investigate their causes, and support developers working at multi-GPU multi-node scales.

Watch (47:21)
NsightSystems GTC

Optimizing Communication with Nsight Systems Network Profiling

Learn how to use Nsight Systems' network profiling capabilities and see how real-world applications utilize GPUs, CPUs, and networking hardware.

Watch (41:45)
Watch video about overcoming pre- and post-processing bottlenecks in ai imaging and CV pipelines with CV-CUDA

Overcoming Pre- and Post-Processing Bottlenecks in AI Imaging and CV Pipelines with CV-CUDA

Watch how Nsight Systems can be used to analyze performance markers and find optimization opportunities for cloud-scale AI.

Watch (42:47)
JWatch video about optimizing HPC simulation and visualization code using NVIDIA Nsight Systems

Optimizing HPC simulation and visualization code using NVIDIA Nsight systems

The NIH Center for Macromolecular Modeling and Bioinformatics used Nsight Systems to achieve a 3x performance increase analyzing large biomolecular systems.

Watch (40:57)





Video Series

Learn about using Nsight Systems for CUDA Development in the CUDA Developer Tools tutorial series.

Watch video about the NVIDIA Nsight Tools Ecosystem

CUDA Developer Tools | NVIDIA Nsight Tools Ecosystem

Watch (4:53)
Watch video introducing Nsight Systems

CUDA Developer Tools | Introduction to Nsight Systems

Watch (9:20)
Watch video about Performance Analysis with the Nsight Systems Timeline

CUDA Developer Tools | Introduction to Nsight Systems

Watch (9:20)
Watch video about optimizing CUDA memory allocations using NVIDIA Nsight Systems

Optimizing CUDA Memory Allocations Using NVIDIA Nsight Systems

Watch (1:25)
Watch video about Nsight Systems command line feature spotlight

Nsight Systems Command Line Feature Spotlight

Watch (1:38)
Watch video about analyzing NCCL usage with NVIDIA Nsight Systems

Analyzing NCCL Usage with NVIDIA Nsight Systems

Watch (1:58)
Watch video about Nsight Systems Feature Spotlight: OpenMP

Nsight Systems Feature Spotlight: OpenMP

Watch (1:19)
Watch video about Nsight Systems: Vulkan Trace

Nsight Systems - Vulkan Trace

Watch (1:28)





Support

To provide feedback, request additional features, or report support issues, please use the Developer Forums .