This page contains instructional videos for NVIDIA® Nsight™ Compute. These videos are a great resource for enhancing your understanding of all the features Nsight Compute has to offer.


Tools and Techniques for Making Efficient Use of GPUs

Efficient Use of GPUs

The seminar is split into two components: firstly, we look at programming GPUs and some of the technologies available to minimise bottlenecks both within and across nodes. This will include using standard language features to programme for GPUs (C++, Fortran), directives-based approaches such as OpenACC or OpenMP, Unified Memory, and an overview of GPUDirect for optimising the communications pipelines. In addition, some tools for scientific visualisation of data will be presented.

Secondly, the focus shifts to how to make use of profiling tools to analyse GPU accelerated applications to identify bottlenecks and ensure optimal performance. Specifically, there will be a demo of two profiling tools with a worked code example:

  • Nsight Systems (system-level analysis), starting at 41:12
  • Nsight Compute (GPU kernel analysis), starting at 1:14:15


Presented 03-30-2022 | Computational BioMed 2022 | Using Nsight Compute 2022.1 (CUDA 11.6) | New in 2022.1

Watch Now | Download the latest Nsight Compute

GPU Technology Conference Nov 2021: Optimizing CUDA Machine Learning Codes with Nsight Profiling Tools

nsight profiling tools

This lab teaches how to use NVIDIA's Nsight tools for analyzing and optimizing CUDA applications. Attendees will be using Nsight Systems to analyze the overall application structure and explore parallelization opportunities. Nsight Compute will be used to analyze and optimize CUDA kernels, using an online machine learning code for 5G.


Presented 11-09-2021 | GTC 2021 | Nsight Compute 2021.3 (CUDA 11.5) | New in 2021.3 | View Lab
Download: Nsight Systems 2021.5, Nsight Compute 2021.3

GPU Technology Conference Nov 2021: Nsight Compute 2021.3 - Guided Analysis

Guided analysis is the set of features in NVIDIA Nsight Compute that provides expert analysis of collected profile data, including insights into performance issues, their causes, code locations, and options to fix them. This includes both a ""behind the scenes"" rules engine and UI elements to help non-experts profile and optimize CUDA kernels. This video gives an overview of how guided analysis works and highlights several new features that have been added in recent releases. This includes focus metrics, report cross-links, increased rule visibility, and documentation references. Guided analysis sets Nsight Compute apart from standard profiling techniques and is a core component that all Nsight Compute users should take advantage of.

To learn more, see GTC session [A31048] Understanding CUDA Application Behavior, Performance, and Optimization Just Got Easier with the Latest Developer Tools or visit


Presented 11-10-2021 | GTC Nov 2021: Nsight Compute 2021.3 (CUDA 11.5) | View on NVIDIA DevZone

GPU Technology Conference Nov 2021: A Complete Overview of Nsight Developer Tools

GTC session [A31048] Understanding CUDA Application Behavior, Performance, and Optimization Just Got Easier with the Latest Developer Tools

It's no secret that high-performance application programming is extremely challenging. Between getting the code correct, ensuring the compiler is generating the best version, debugging errors, and profiling (including Nsight Compute) and optimizing performance, it's a lot for even a team of developers to take on. The latest improvements in CUDA developer tools are focused on making each step in this process easier and more efficient. We'll first give a brief overview of the tools available for free to CUDA developers, then spend most of our time presenting the newest features and explaining the types of problems they'll help you solve. Topics will include the latest performance metrics, guided optimization workflows, new usability features, and more. We're constantly improving tools to enable CUDA developers to take advantage of the latest hardware and software features. If you’re doing compute on GPUs, this presentation will have something for you.


Presented 11-10-2021 | GTC Nov 2021: Nsight Compute 2021.3 (CUDA 11.5) | View on NVIDIA DevZone (Nsight Compute starts at 15:33)

SIMPLIFY HPC DEVELOPMENT FOR CUDA ON ARM WITH THE LATEST NSIGHT DEVELOPER TOOLS

As the CUDA on Arm HPC ecosystem rapidly expands, developers need the latest tools to make sure they can debug, profile, and optimize applications to take advantage of it all. Nsight developer tools and CUDA debuggers are adding new features all the time to help.

This session introduces the tools available for CUDA developers on Arm platforms, with a focus on the problems these tools help you solve. You'll learn about performance optimization metrics, developer workflows that enable more usage models, and usability features to get you the information you're looking for, faster.

This session is relevant to anyone developing for CUDA platforms, whether it's the first time you've picked up the tools, or you're a power user looking to learn the latest and greatest new features.


View Session

GPU Technology Conference 2021: Nsight Compute 2021.1 - Requests, Wavefronts, Sectors Metrics: Understanding and Optimizing Memory-Bound Kernels with Nsight Compute

Learn how you can get the most out of Nsight Compute to identify and solve memory access inefficiencies in your kernel code. This video discusses the basics of the memory system on modern GPUs, breaks down the meaning of the key metrics to evaluate the achieved performance, and shows how new features in Nsight Compute can improve your analysis workflow. This includes recently added features to the tool, including roofline analysis, application replay, and a walk-through of the memory chart. This video shows concepts and examples using Nsight Compute and can be directly applicable to your own optimization efforts.


Presented 04-12-2012 | GTC 2021: Nsight Compute 2021.1 (CUDA 11.3) | New in 2021.1 | View on NVIDIA On-Demand

GPU Technology Conference 2021: Nsight Compute 2021.1 - Resource Tracking for new CUDA Toolkit 11.3 Resources

This GTC 2021 spotlight highlights Nsight Compute's support for CTK11.3 resources with the Nsight Compute Resource Tracker, including:

  • User Objects
  • Memory Allocations
  • Memory Pools
  • Graph Nodes

Presented 04-12-2012 | GTC 2021: Nsight Compute 2021.1 (CUDA 11.3) | New in 2021.1 | View on YouTube

GPU Technology Conference 2021: CUDA is Evolving, and the Latest Developer Tools are Adapting to Keep Up

As the CUDA ecosystem rapidly expands, developers need the latest tools (including Nsight Compute) to make sure they can debug, profile, and optimize to take advantage of it all. Nsight developer tools along with other CUDA debuggers, profilers, and checkers are adding new features all the time to help. We'll go over the latest features in developer tools (including Nsight Compute) with a focus on the problems they help you solve. You'll learn about updated performance metrics for the latest architecture, new workflows that enable more usage models, and usability features to get you the information you're looking for, faster. This session is relevant to anyone developing for CUDA platforms, whether it's the first time you've picked up the tools or you're a power user looking to learn the latest and greatest new features.


Presented 04-12-2012 | GTC 2021: Nsight Compute 2021.1 (CUDA 11.3) | New in 2021.1 | View on YouTube

Supercomputing 2020: Nsight Compute 2020.2 - New Profiling mode: Application Replay

This Supercomputing 2020 spotlight introduces the new Application Replay mode, which complements the Kernel Replay mode.

First, you'll get an overview of the Nsight Compute profiling and collection process, diving into the normal 'Kernel' replay mode. Then, we'll look at situations when using the new 'Application' replay mode is advantageous.


Presented 11-09-2020 | Supercomputing 2020: Nsight Compute 2020.2 (CUDA 11.1) | New in 2020.2 | View on YouTube

Supercomputing 2020: Nsight Compute's Roofline and NVIDIA Ampere GPU Architecture Analysis

This Supercomputing 2020 spotlight reviews Nsight Compute's Roofline Analysis tool as well as new analysis features for NVIDIA's Ampere GPU architecture.

We will show how Roofline analysis provides a graphical view of how a CUDA kernel’s Arithmetic Intensity and FLOPS performance. Using this analysis, it's easy to see how your kernel's performance compares to hardware constraints and indicates how much room there is for improvement and the kind of optimizations that can be employed to improve performance.

Next, we will explore how Nsight Compute allows you to monitor the throughput of the NVIDIA Ampere architecture's CUDA Asynchronous Copy feature.


Presented 11-11-2020 | Supercomputing 2020: Nsight Compute 2020.2 (CUDA 11.1) | New in 2020.1 | View on YouTube

Nsight Compute 2020.1 Spotlight

This NVIDIA Nsight Compute 2020.1 release spotlight highlights these new features:

  • Roofline Analysis for Visualization of Performance Headroom
  • NVIDIA Ampere Architecture metrics
    • Asynchronous Copy to Shared Memory
    • Compute Data Compression


Nsight Compute Overview | New in 2020.1 available 2020/05/28 (CUDA 11.0)| View on YouTube

GTC 2020 Lab: Modern CUDA Programming Hazards and the Linux Nsight Toolbox to Fix Them

In this hands-on lab, you'll learn from NVIDIA developers and experts about efficiently debugging, profiling, and optimizing CUDA applications on Linux. Through a set of exercises, you'll use the latest features in NVIDIA's suite of tools to detect and fix common issues of correctness and performance in their applications.


Presented 05-21-2020 | GTC 2020: Nsight Compute 2020.1 (CUDA 11.0) | Lab Materials on GitHub

GTC 2020: Optimizing CUDA Kernels in HPC Simulation and Visualization Codes Using NVIDIA Nsight Compute 2020.1

NVIDIA engineers and the developers of molecular modeling tools at University of Illinois will share their experiences using NVIDIA Nsight Compute to analyze and optimize several CUDA/Optix kernels in HPC applications, such as VMD and NAMD. This presentation highlights several intermediate and advanced kernel profiling techniques and show you how to iteratively identify bottlenecks and improve your kernel performance. You'll also get an overview of NVIDIA Nsight Compute 2020.1 features including support for the new NVIDIA Ampere architecture and new Roofline Analysis


Presented 05-21-2020 | GTC 2020: Nsight Compute 2020.1 (CUDA 11.0) | View on DevZone

Blue Waters Webinar 2019: Introduction to NVIDIA Nsight Compute - A CUDA Kernel Profiler

Understanding and optimizing the runtime behavior of your code can be a challenging effort but is often rewarded with significant performance gains. NVIDIA Nsight Compute is a CUDA kernel profiler that provides detailed performance data and offers guidance for optimizing your CUDA kernels. You'll learn about how to collect a wide range of performance data for your CUDA kernels, how automatic rules help in detecting common performance pitfalls and offering guidance through the profile reports, how to quickly compare profiling results to evaluate the effects of your code changes, and how to customize the tool to fit best to your optimization workflow


Presented 11-06-2019 | GTC 2020: Nsight Compute 2019.4 (CUDA 10.2) | View on bluewaters.ncsa.illinois.edu

GTC Silicon Valley-2019 ID:S9345:CUDA Kernel Profiling Using NVIDIA Nsight Compute

Learn about NVIDIA's developer tool, Nsight Compute, for optimizing your CUDA kernels. Nsight Compute is an interactive kernel profiler for CUDA applications that provides detailed performance metrics and API debugging via a user interface and command line tool. In addition, its baseline feature allows users to compare results within the tool. We will explain how Nsight Compute provides a customizable and data-driven user interface and metric collection and can be extended with analysis scripts for post-processing results.
View the slides (pdf)


Presented March 2019 | GTC 2020: Nsight Compute 2019.1 (CUDA 10.1) | View on DevZone

SIGGRAPH 2018: OptiX Profiling with Nsight Compute

In this hands-on live demo, we'll show how NSIGHT Compute can be used to profile applications built with NVIDIA OptiX. We'll identify perfomance bottlenecks in several OptiX applications and identify the key differences between vanilla CUDA programs and OptiX applications from a profiling perspective. We'll also demonstrate how to customize NSIGHT Compute to extract and present profiling information in the way that is most suitable for a given OptiX application. This talk will contain almost no slides and instead focus on live usage of the tools involved.


Presented Aug 15 2018 | SIGGRAPH2018: Nsight Compute 1.0 (CUDA 10.0) | View on on-demand.gputechconf.com

 Download   Documentation 


PRODUCT INFO