DEVELOPER BLOG

HPC |

Nsight Compute 2020.3 Simplifies CUDA Kernel Profiling and Optimization

The 2020.3 release of NVIDIA Nsight Compute included in CUDA Toolkit 11.2 introduces several new features that simplify the process of CUDA kernel profiling and optimization.

Profile Series

The new Profile Series feature allows developers to configure ranges for multiple kernel parameters. Nsight Compute will automatically iterate through the ranges and profile each combination to help you find the best configuration. These parameters include the number of registers per thread, shared memory sizes, and the shared memory configuration. This automates a process that previously would need manual support, and can provide optimized performance configurations with minimal changes to source code.

The Profile Series configuration is available in the UI’s Interactive Profiling activity.

Import Source

This highly requested feature enables users to archive source files within their Nsight Compute results. It allows any user with access to the results to resolve performance data to lines in the source code, even if they don’t have access to the original source files. Sharing results with teammates and archiving them for future analysis are just a couple of uses for this new feature. Users can import source files with the (–import-source) command-line option or via the UI when configuring the profile. 

Source Files can also be imported later via the Profile Menu.

Additionally, there are several other new capabilities available in this release. These include Memory Allocation Tracking, support for derived metrics, and additional configurations and advice for the recently released Application Replay feature.

For complete details, check out the Nsight Compute Release Notes.

Download Nsight Compute 2020.3 and check out featured spotlight video demonstrations on Roofline Analysis and Application Replay

View all Nsight DevBlogs.