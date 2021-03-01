Nsight Compute 2021.3 - New Features

Resolved Issues

Fixed that kernels with the same name and launch configuration were in some scenarios associated with the wrong profiling results during application replay.

Fixed an issue with binary forward compatibility of the report format.

Fixed an issue with applications calling into the CUDA API during process teardown.

Fixed an issue profiling application using pre-CUDA API 3.1 contexts.

Fixed a crash when resolving files on the Source page.

Fixed that opening reports with large embedded CUBINs would hang the UI.

Fixed an issue with remote profiling on a target where the UI is already launched.

General

Added support for the CUDA toolkit 11.5.

Added a new rule for detecting inefficient memory access patterns in the L1TEX cache and L2 cache.

Added a new rule for detecting high usage of system or peer memory.

Added new IAction::sass_by_pc function to the the NvRules API.

The Python-based report interface is now available for Windows and MacOS hosts, too.

Added Hierarchical Roofline section files in a new "roofline" section set.

Added support for collecting CPU call stack information.

NVIDIA Nsight Compute

Added support for new remote profiling SSH connection and authentication options as well as local SSH configuration files.

' Added an Occupancy Calculator which can be opened directly from a profile report or as a new activity. It offers feature parity to the CUDA Occupancy Calculator spreadsheet.

Added new Baselines tool window to manage (hide, update, re-order, save/load) baseline selections.

The Source page views now support multi-line/cell selection and copy/paste. Different colors are used for highlighting selections and correlated lines.

The search edit on the Source page now supports Shift+Enter to search in reverse direction.

The Memory Workload Analysis Chart can be configured to show throughput values instead of transferred bytes.

The Profile activity now supports the --devices option.

The NVLink Topology diagram displays per NVLink metrics.

diagram displays per NVLink metrics. Added a new tool window showing the CPU call stack at the location where the current thread was suspended during interactive profiling activities.

If enabled, the Call Stack / NVTX page of the profile report shows the captured CPU call stack for the selected kernel launch.

NVIDIA Nsight Compute CLI

Added support for printing source/metric content with the new --page source and --print-source command line options.

Added new option --call-stack to enable collecting the CPU call stack for every profiled kernel launch.

Resolved Issues

Fixed that memory_* metrics could not be collected with the --metrics option.

Fixed that selection and copy/paste was not supported for section header tables on the Details page.

Fixed issues with the Source page when collapsing the content.

Fixed that the UI could crash when applying rules to a new profile result.

Fixed that PC Sampling metrics were not available for Profile Series .

Fixed that local profiling did not work if no non-loopback address was configured for the system.

