Simulation / Modeling / Design

Optimizing GPU Utilization with Nsight Compute 2021.3

Featured image for Nsight development tools

NVIDIA announced the latest Nsight Compute 2021.3 with new features for measuring and modeling occupancy, source and assembly code correlation, and a hierarchical roofline model to identify bottlenecks caused by accessing cache memory.

Occupancy Calculator

Nsight Compute 2021.3 adds a new Occupancy Calculator activity that helps you understand the hardware resource utilization of their kernels and model how adjustments could impact occupancy.

Occupancy is a ratio of active warps per SM to the theoretical maximum number of active warps. Low occupancy may represent kernels that are too small, unbalanced workloads, or resource contention. All of which can limit the performance of a kernel on a GPU with a specific set of available resources.

Screen display of the Nsight Compute Occupancy Calculator feature showing memory occupancy and GPU hardware utilization.
Figure 1. Display of Nsight Compute Occupancy Calculator

Command line source page

This release adds a highly requested feature that enables accessing the information from the Source page in the GUI directly from the command line. By using the --page source flag, users can see the lines of source, PTX, or assembly and the collected metrics for those lines output on the command line.

This feature gives additional flexibility when it comes to analyzing the collected data as well as scripting and post-processing results for further reporting and analysis.

Figure 2. Example of the command line source output feature.

Hierarchical roofline

The Roofline chart now has support for a hierarchical roofline, which adds rooflines for the L1 and L2 caches in addition to device memory. You can see how close their kernels are to the bandwidth limits of each memory level to determine whether their kernels have bottlenecks related to accessing memory.

Screen image of hierarchical roofline output graph to show that memory access is optimized, or needs optimizing.
Figure 3. Nsight Compute displaying roofline hierarchy comparison.

Additional enhancements

Further capabilities include more configurable baseline comparisons, direct access to source-level information from the CLI, and additional SSH functionality. 

For more information about debugging and profiling tools, register to join this NVIDIA GTC technical session: Understanding CUDA Application Behavior, Performance, and Optimization Just Got Easier with the Latest Developer Tools.

For more information, see the following resources:

To view the latest tutorial information, see Nsight Compute videos and Nsight Compute posts.

Discuss (0)