NVIDIA® Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs; from large server to our smallest SoC.



 Download Now

Overview

NVIDIA Nsight Systems is a low overhead performance analysis tool designed to provide insights developers need to optimize their software. Unbiased activity data is visualized within the tool to help users investigate bottlenecks, avoid inferring false-positives, and pursue optimizations with higher probability of performance gains. Users will be able to identify issues, such as GPU starvation, unnecessary GPU synchronization, insufficient CPU parallelizing, and even unexpectedly expensive algorithms across the CPUs and GPUs of their target platform. It is designed to scale across a wide range of NVIDIA platforms such as: large Tesla multi-GPU x86 servers, Quadro workstations, Optimus enabled laptops, DRIVE devices with Tegra+dGPU multi-OS, and Jetson. NVIDIA Nsight Systems can even provide valuable insight into the behaviors and load of deep learning frameworks such as PyTorch and TensorFlow; allowing users to tune their models and parameters to increase overall single or multi-GPU utilization.

Platforms

Learn about Nsight Systems on your platform:

 

Release Highlights

2020.3 - Announcement Post

  • NVIDIA Ampere Architecture
  • CUDA 11.0
  • CUDA Graph correlation
  • OptiX
  • Vulkan KHR ray tracing extension
  • OpenMP
  • CLI improvements
  • UX improvements

2020.2 - Announcement Post

  • Improved CLI support for
    • Power9 Architecture
    • ARM Server Base System Architecture
  • OpenMP 5
  • New CLI stats command
  • CPU utilization estimates for more secure environments limiting scheduler info
  • UX improvements
  • Vulkan1.2 support
  • Visual Studio Integration is now available

2020.1 - Announcement Post

  • Support for Power9 Architecture(CLI)
  • Timeline improvements showing CUDA context "all streams" sub-tree
  • Ability to export stats from CLI and GUI
  • CLI Enhancements
    • Support for multiple simultaneous sessions
    • Improved 'nsys launch' terminal behavior
      • Forwards input and output streams
      • Forwarding signals
      • Run in terminal until launched process exits

Downloads

Available for profiling directly on Linux workstations and servers, including the NVIDIA DGX line, or remotely from a variety of hosts: Windows, Linux, or MacOSX.

Download Now


Not profiling Linux workstations or servers?
Learn about other target platforms.

Documentation

Support

To provide feedback, request additional features, or report support issues, please use the Developer Forums.

System Requirements

Supported target operating systems for data collection:

  • Ubuntu 14.04, 16.04, 18.04 and 20.04
  • CentOS 7+*
  • Red Hat Enterprise Linux 7+*
* In distribution versions below 7.4, some features will be disabled unless the OS kernel has been upgraded to kernel version 4.3 or greater

Supported target hardware

  • GPU: Pascal or newer
  • CPU: x86-64, ARM Server Base System Architecture and Power9 processors*
* Intel Haswell architecture or newer is required for LBR sampling backtraces

Supported target software

  • 64 bit applications only
  • CUDA 9.0+ for CUDA tracing

Supported host operating systems for data visualization:

  • Windows 7+
  • Mac OS X 10.9+
  • Ubuntu 14.04, 16.04, 18.04 and 20.04

Release Highlights

2020.3 - Announcement Post

  • NVIDIA Ampere Architecture
  • CUDA 11.0
  • CUDA Graph correlation
  • OptiX
  • Vulkan KHR ray tracing extension
  • DirectX Raytracing(DXR) Tier 1.1
  • UX improvements

2020.2 - Announcement Post

  • Direct3D12 multi-GPU support
  • Vulkan1.2 support
  • Windows video memory timeline improvements
    • Usage graph
    • Paging queue
    • Page eviction events
  • Hotkey to insert user annotation markers (NVTX)
  • UX improvements
  • Visual Studio Integration is now available

2020.1 - Announcement Post

  • Improved event viewer selection double click and timeline zoom CTRL+double click behavior
  • OS runtime trace backtrace threshold and depth controls
  • Timeline enhancements to show correlation of CPU wait on D3D12 fences

Downloads

Available for profiling directly on Linux workstations and servers, including the NVIDIA DGX line, or remotely from a variety of hosts: Windows, Linux, or MacOSX.
Visual Studio Integration*requires Nsights Sytems to be installed

Download Now


Not profiling Windows targets?
Learn about other target platforms.

Documentation

Support

To provide feedback, request additional features, or report support issues, please use the Developer Forums.

System Requirements

Supported operating systems

    • Windows 10*
* Windows remote profiling is not supported at this time

Supported target hardware

  • GPU: Pascal or newer
  • CPU: x86-64 processors

Supported target software

  • 64 bit applications only
  • CUDA 10.0+ for CUDA tracing
  • Requires driver r411.63 or newer

Release Highlights

2019.4

  • Ftrace collection on Linux
  • Event table - alternative view of timeline data
  • Improved CUDA memory transfer color scheme
  • Android 9 support
  • Expanded export capabilities
    • New data sources: thread information, cuDNN, cuBLAS

2019.3

  • QNX OS runtime backtraces for long blocking functions
  • Exporters for SQLite & JSON
    • NVTX, CUDA, OS Runtime Trace(OSRT)

Downloads

Nsight Systems is bundled as part of the following product development suites:

Jetson via NVIDIA SDK Manager

Documentation

Support

To provide feedback, request additional features, or report support issues, please use the Developer Forums.

System Requirements

Supported Target Hardware

  • ShieldTV
  • Jetson AGX Xavier, Jetson TX2, Jetson TX1
  • DRIVE AGX Pegasus, DRIVE AGX Xavier, DRIVE PX Parker AutoChauffeur, DRIVE PX Parker AutoCruise

Supported target operating systems for data collection:

  • QNX
  • Linux
  • Android

Supported host operating systems for data visualization:

  • Ubuntu 16.04, and 18.04

Features

Learn about feature support per target platform group

Feature
Linux
Workstations and Servers
Windows
Workstations and Gaming PCs
Jetson
Autonomous Machines
DRIVE
Autonomous Vehicles
View system-wide application behavior across CPUs and GPUs        
CPU cores utilization, process, & thread activities
yes
yes
yes
yes
CPU thread periodic sampling backtraces
yes*
no
yes
yes
CPU thread blocked state backtraces
yes**
yes
yes
yes
CPU performance counter sampling
no
no
yes
yes
GPU workload trace
yes
yes
yes
yes
GPU context switch trace
no
no
yes
yes
SOC hypervisor trace
-
-
-
yes
SOC memory bandwidth sampling
-
-
yes
yes
SOC Accelerators trace
-
-
Xavier
Xavier
OS Event Trace
ftrace
ETW
ftrace
ftrace
Investigate CPU-GPU interactions and bubbles        
User annotations API trace
NVIDIA Tools Extension API (NVTX)
yes
yes
yes
yes
CUDA API
yes
yes
yes
yes
CUDA libraries trace (cuBLAS, cuDNN & TensorRT)
yes
no
yes
yes
OpenGL API trace
yes
yes
yes
yes
Vulkan API trace
yes
yes
no
no
Direct3D12, Direct3D11, DXR, & PIX APIs
-
yes
-
-
OptiX
7.1+
7.1+
-
-
Bidirectional correlation of API and GPU workload
yes
yes
yes
yes
Identify GPU idle and sparse usage
yes
yes
yes
yes
Multi-GPU Graphics trace
-
Direct3D12
-
-
Ready for big data        
Fast GUI capable of visualizing in excess of 10 million events on laptops
yes
yes
yes
yes
Additional command line collection tool
yes
no
no
no
NV-Docker container support
yes
-
-
-
NVIDIA GPU Cloud support
yes
-
-
-
Minimum user privilege level
user
administrator
root
root

* On Intel Haswell and newer CPU architecture

** Only with OS runtime trace enabled. Some syscalls such as handcrafted assembly may be missed. Backtraces may only appear if time threasholds are exceeded. 


What Users Are Saying

Tracxpoint

We noticed that our new Quadro P6000 server was ‘starved’ during training and we needed experts for supporting us. NVIDIA Nsight Systems helped us to achieve over 90 percent GPU utilization. A deep learning model that previously took 600 minutes to train, now takes only 90.

Felix Goldberg, Chief AI Scientist

NIH Center for Macromolecular Modeling and Bioinformatics at University of Illinois at Urbana-Champaign

Watch John Stone, present how he achieved over a 3x performance increase in VMD; a popular tool for analyzing large biomolecular systems.

Related Media

Direct3D11 Feature Spotlight

The 2019.6 release aims to provide a more detailed data collection, exploration, and collection control for all markets ranging from high performance computing to visual effects. 2019.6 introduces new data sources, improved visual data navigation, expanded CLI capabilities, extended export coverage and statistics.

Watch Video

Command Line Sessions Feature Spotlight

NVIDIA Nsight Systems 2020.1 release adds CLI support for Power9 architecture. The ability to run multiple recording sessions simultaneously in CLI. UX improvements and stats export options in the GUI and CLI.

Watch Video

OpenMP Feature spotlight

In the 2020.3 release, Nsight Systems adds ability to analyze applications parallelized using OpenMP.

Watch Video

Statistics Driven Profiling

In the 2019.3 release, Nsight Systems adds the ability to analyze reports using statistics to identify opportunities for improving your GPU-accelerated application.

Watch Video

2019.4 Release Spotlight

The 2019.4 release aims to provide a more detailed data collection, exploration, and collection control for all markets ranging from high performance computing to visual effects. 2019.4 introduces new data sources, improved visual data navigation, expanded CLI capabilities, extended export coverage and statistics.

Watch Video

Vulkan Trace

In the 2019.3 release, Nsight Systems adds the ability to trace vulkan on Windows and Linux targets; allowing you to inspect the CPU/GPU relationship and solve complicated frame stuttering issues in your Vulkan application.

Watch Video

Optimizing HPC simulation and visualization code

Watch John Stone, of the NIH Center for Macromolecular Modeling and Bioinformatics at University of Illinois at Urbana-Champaign, discuss how he achieved over a 3x performance increase of VMD, a popular tool for analyzing large biomolecular systems.

Watch Video

NVIDIA Jetson Partner Stories: Stereolabs

In the drone industry, the weight and size of the main board is critical. With the ZED stereo camera by Stereolabs, developers can capture the world in 3D and map 3D models of indoor and outdoor scenes up to 20 meters. The small form factor of the Jetson TX1 enables Stereolabs to bring advanced computer vision capabilities to smaller and smaller systems. See what is possible when these two technologies come together in drones to power the latest virtual reality applications.

Watch Video

NVIDIA System Profiler - Introduction

An introduction to the latest NVIDIA System Profiler. Includes an UI workthrough and setup details for NVIDIA System Profiler on the NVIDIA Jetson Embedded Platform. Download and learn more here.

Watch Video