Nsight Compute Videos
This page contains instructional videos for NVIDIA® Nsight™ Compute. These videos are a great resource for enhancing your understanding of all the features Nsight Compute has to offer.
GPU Technology Conference 2022: Nsight Compute 2022.1 - How to Understand and Optimize Shared Memory Accesses using Nsight Compute
For efficiently optimizing your kernel's usage of shared memory the key ingredients are: (1) a basic mental model of the hardware implementation of shared memory on modern NVIDIA GPUs, (2) a clear definition of the available performance metrics for shared memory in Nsight Compute, and (3) a map of the hardware's behavior to the observed values of these performance metrics. We'll cover these three requirements and walk through the detailed information provided for shared memory accesses in the profile reports generated by Nsight Compute. We'll discuss concepts such as shared memory requests, wavefronts, and bank conflicts using examples of common memory access patterns, including asynchronous data copies from global memory to shared memory as introduced by the NVIDIA Ampere GPU architecture.
Tools and Techniques for Making Efficient Use of GPUs
The seminar is split into two components: firstly, we look at programming GPUs and some of the technologies available to minimise bottlenecks both within and across nodes. This will include using standard language features to programme for GPUs (C++, Fortran), directives-based approaches such as OpenACC or OpenMP, Unified Memory, and an overview of GPUDirect for optimising the communications pipelines. In addition, some tools for scientific visualisation of data will be presented.
Secondly, the focus shifts to how to make use of profiling tools to analyse GPU accelerated applications to identify bottlenecks and ensure optimal performance. Specifically, there will be a demo of two profiling tools with a worked code example:
- Nsight Systems (system-level analysis), starting at 41:12
- Nsight Compute (GPU kernel analysis), starting at 1:14:15
Presented 03-30-2022 | Computational BioMed 2022 | Using Nsight Compute 2022.1 (CUDA 11.6) | New in 2022.1
GPU Technology Conference Nov 2021: Optimizing CUDA Machine Learning Codes with Nsight Profiling Tools
This lab teaches how to use NVIDIA's Nsight tools for analyzing and optimizing CUDA applications. Attendees will be using Nsight Systems to analyze the overall application structure and explore parallelization opportunities. Nsight Compute will be used to analyze and optimize CUDA kernels, using an online machine learning code for 5G.
GPU Technology Conference Nov 2021: Nsight Compute 2021.3 - Guided Analysis
Guided analysis is the set of features in NVIDIA Nsight Compute that provides expert analysis of collected profile data, including insights into performance issues, their causes, code locations, and options to fix them. This includes both a ""behind the scenes"" rules engine and UI elements to help non-experts profile and optimize CUDA kernels. This video gives an overview of how guided analysis works and highlights several new features that have been added in recent releases. This includes focus metrics, report cross-links, increased rule visibility, and documentation references. Guided analysis sets Nsight Compute apart from standard profiling techniques and is a core component that all Nsight Compute users should take advantage of.
To learn more, see GTC session [A31048] Understanding CUDA Application Behavior, Performance, and Optimization Just Got Easier with the Latest Developer Tools or visit
GPU Technology Conference Nov 2021: A Complete Overview of Nsight Developer Tools
GTC session [A31048] Understanding CUDA Application Behavior, Performance, and Optimization Just Got Easier with the Latest Developer Tools
It's no secret that high-performance application programming is extremely challenging. Between getting the code correct, ensuring the compiler is generating the best version, debugging errors, and profiling (including Nsight Compute) and optimizing performance, it's a lot for even a team of developers to take on. The latest improvements in CUDA developer tools are focused on making each step in this process easier and more efficient. We'll first give a brief overview of the tools available for free to CUDA developers, then spend most of our time presenting the newest features and explaining the types of problems they'll help you solve. Topics will include the latest performance metrics, guided optimization workflows, new usability features, and more. We're constantly improving tools to enable CUDA developers to take advantage of the latest hardware and software features. If you’re doing compute on GPUs, this presentation will have something for you.
SIMPLIFY HPC DEVELOPMENT FOR CUDA ON ARM WITH THE LATEST NSIGHT DEVELOPER TOOLS
As the CUDA on Arm HPC ecosystem rapidly expands, developers need the latest tools to make sure they can debug, profile, and optimize applications to take advantage of it all. Nsight developer tools and CUDA debuggers are adding new features all the time to help.
This session introduces the tools available for CUDA developers on Arm platforms, with a focus on the problems these tools help you solve. You'll learn about performance optimization metrics, developer workflows that enable more usage models, and usability features to get you the information you're looking for, faster.
This session is relevant to anyone developing for CUDA platforms, whether it's the first time you've picked up the tools, or you're a power user looking to learn the latest and greatest new features.
GPU Technology Conference 2021: Nsight Compute 2021.1 - Requests, Wavefronts, Sectors Metrics: Understanding and Optimizing Memory-Bound Kernels with Nsight Compute
Learn how you can get the most out of Nsight Compute to identify and solve memory access inefficiencies in your kernel code. This video discusses the basics of the memory system on modern GPUs, breaks down the meaning of the key metrics to evaluate the achieved performance, and shows how new features in Nsight Compute can improve your analysis workflow. This includes recently added features to the tool, including roofline analysis, application replay, and a walk-through of the memory chart. This video shows concepts and examples using Nsight Compute and can be directly applicable to your own optimization efforts.
GPU Technology Conference 2021: Nsight Compute 2021.1 - Resource Tracking for new CUDA Toolkit 11.3 Resources
This GTC 2021 spotlight highlights Nsight Compute's support for CTK11.3 resources with the Nsight Compute Resource Tracker, including:
- User Objects
- Memory Allocations
- Memory Pools
- Graph Nodes
GPU Technology Conference 2021: CUDA is Evolving, and the Latest Developer Tools are Adapting to Keep Up
As the CUDA ecosystem rapidly expands, developers need the latest tools (including Nsight Compute) to make sure they can debug, profile, and optimize to take advantage of it all. Nsight developer tools along with other CUDA debuggers, profilers, and checkers are adding new features all the time to help. We'll go over the latest features in developer tools (including Nsight Compute) with a focus on the problems they help you solve. You'll learn about updated performance metrics for the latest architecture, new workflows that enable more usage models, and usability features to get you the information you're looking for, faster. This session is relevant to anyone developing for CUDA platforms, whether it's the first time you've picked up the tools or you're a power user looking to learn the latest and greatest new features.
Supercomputing 2020: Nsight Compute 2020.2 - New Profiling mode: Application Replay
First, you'll get an overview of the Nsight Compute profiling and collection process, diving into the normal 'Kernel' replay mode. Then, we'll look at situations when using the new 'Application' replay mode is advantageous.
Supercomputing 2020: Nsight Compute's Roofline and NVIDIA Ampere GPU Architecture Analysis
This Supercomputing 2020 spotlight reviews Nsight Compute's Roofline Analysis tool as well as new analysis features for NVIDIA's Ampere GPU architecture.
We will show how Roofline analysis provides a graphical view of how a CUDA kernel’s Arithmetic Intensity and FLOPS performance. Using this analysis, it's easy to see how your kernel's performance compares to hardware constraints and indicates how much room there is for improvement and the kind of optimizations that can be employed to improve performance.
Next, we will explore how Nsight Compute allows you to monitor the throughput of the NVIDIA Ampere architecture's CUDA Asynchronous Copy feature.
Nsight Compute 2020.1 Spotlight
This NVIDIA Nsight Compute 2020.1 release spotlight highlights these new features:
- Roofline Analysis for Visualization of Performance Headroom
- NVIDIA Ampere Architecture metrics
- Asynchronous Copy to Shared Memory
- Compute Data Compression
GTC 2020 Lab: Modern CUDA Programming Hazards and the Linux Nsight Toolbox to Fix Them
In this hands-on lab, you'll learn from NVIDIA developers and experts about efficiently debugging, profiling, and optimizing CUDA applications on Linux. Through a set of exercises, you'll use the latest features in NVIDIA's suite of tools to detect and fix common issues of correctness and performance in their applications.
Presented 05-21-2020 | GTC 2020: Nsight Compute 2020.1 (CUDA 11.0) | Lab Materials on GitHub
GTC 2020: Optimizing CUDA Kernels in HPC Simulation and Visualization Codes Using NVIDIA Nsight Compute 2020.1
NVIDIA engineers and the developers of molecular modeling tools at University of Illinois will share their experiences using NVIDIA Nsight Compute to analyze and optimize several CUDA/Optix kernels in HPC applications, such as VMD and NAMD. This presentation highlights several intermediate and advanced kernel profiling techniques and show you how to iteratively identify bottlenecks and improve your kernel performance. You'll also get an overview of NVIDIA Nsight Compute 2020.1 features including support for the new NVIDIA Ampere architecture and new Roofline Analysis
Presented 05-21-2020 | GTC 2020: Nsight Compute 2020.1 (CUDA 11.0) | View on DevZone
Blue Waters Webinar 2019: Introduction to NVIDIA Nsight Compute - A CUDA Kernel Profiler
Understanding and optimizing the runtime behavior of your code can be a challenging effort but is often rewarded with significant performance gains. NVIDIA Nsight Compute is a CUDA kernel profiler that provides detailed performance data and offers guidance for optimizing your CUDA kernels. You'll learn about how to collect a wide range of performance data for your CUDA kernels, how automatic rules help in detecting common performance pitfalls and offering guidance through the profile reports, how to quickly compare profiling results to evaluate the effects of your code changes, and how to customize the tool to fit best to your optimization workflow
Presented 11-06-2019 | GTC 2020: Nsight Compute 2019.4 (CUDA 10.2) | View on bluewaters.ncsa.illinois.edu
GTC Silicon Valley-2019 ID:S9345:CUDA Kernel Profiling Using NVIDIA Nsight Compute
Learn about NVIDIA's developer tool, Nsight Compute, for optimizing your CUDA kernels. Nsight Compute is an interactive kernel profiler for CUDA applications that provides detailed performance metrics and API debugging via a user interface and command line tool. In addition, its baseline feature allows users to compare results within the tool. We will explain how Nsight Compute provides a customizable and data-driven user interface and metric collection and can be extended with analysis scripts for post-processing results.
View the slides (pdf)
Presented March 2019 | GTC 2020: Nsight Compute 2019.1 (CUDA 10.1) | View on DevZone
SIGGRAPH 2018: OptiX Profiling with Nsight Compute
In this hands-on live demo, we'll show how NSIGHT Compute can be used to profile applications built with NVIDIA OptiX. We'll identify perfomance bottlenecks in several OptiX applications and identify the key differences between vanilla CUDA programs and OptiX applications from a profiling perspective. We'll also demonstrate how to customize NSIGHT Compute to extract and present profiling information in the way that is most suitable for a given OptiX application. This talk will contain almost no slides and instead focus on live usage of the tools involved.
Presented Aug 15 2018 | SIGGRAPH2018: Nsight Compute 1.0 (CUDA 10.0) | View on on-demand.gputechconf.com