Simulation / Modeling / Design

Nsight Compute 2020.3 Simplifies CUDA Kernel Profiling and Optimization

The 2020.3 release of NVIDIA Nsight Compute included in CUDA Toolkit 11.2 introduces several new features that simplify the process of CUDA kernel profiling and optimization.

Profile Series

The new Profile Series feature enables you to configure ranges for multiple kernel parameters. Nsight Compute automatically iterates through the ranges and profiles each combination to help you find the best configuration.

These parameters include the number of registers per thread, shared memory sizes, and the shared memory configuration. This automates a process that previously would need manual support, and can provide optimized performance configurations with minimal changes to source code.

The Profile Series configuration is available in the Interactive Profiling activity.

Import Source

This highly requested feature enables you to archive source files within your Nsight Compute results. It allows any user with access to the results to resolve performance data to lines in the source code, even if you don’t have access to the original source files. Sharing results with teammates and archiving them for future analysis are just a couple of uses for this new feature.

You can import source files with the (--import-source) command-line option or through the UI when configuring the profile. Source files can also be imported later through the Profile Menu.

There are several other new capabilities available in this release. These include memory allocation tracking, support for derived metrics, and additional configurations and advice for the recently released Application Replay feature.

For more information, see the Nsight Compute release notes.

Download Nsight Compute 2020.3 and check out featured spotlight video demonstrations on Roofline Analysis and Application Replay

View all Nsight posts >>

Discuss (0)