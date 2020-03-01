Nsight Compute 2020.3 - New Features

General

Support for CUDA Toolkit 11.2 Update 1 was added.

Added support for LDSM instruction-level metrics.

NVIDIA Nsight Compute

LDSM instruction-level metrics are shown in the Source page and memory tables.

page and memory tables. Improved reporting and documentation for collecting Profile Series .

. Frozen columns in the Source page are automatically scrolled into view.

Resolved Issues

Fixed an issue when profiling multi-threaded applications.

Fixed an issue that would not automatically restart when using Reset Application Data .

would not automatically restart when using . Fixed issues with target applications using libstdc++.

Fixed an issue when collecting single-pass metrics in multiple Nsight Compute instances.

Fixed an issue when using Kernel ID and setting Launch Capture Count as non-zero in the UI's Profile activity.

and setting as non-zero in the UI's activity. Fixed an issue that prevented different users on the same Linux system to use in shared instance mode.

in shared instance mode. Fixed an issue that prevented resources from being properly renamed using NVTX information in the UI.

General

Support for CUDA Toolkit 11.2 was added.

Added support for derived metrics in section files. Derived metrics can be used to create new metrics based on existing metrics and constants. See the Customization Guide for details.

Added a new Import Source (--import-source) option to the UI and command line to permanently import source files into the report, when available.

Added a new section that shows that shows the number of NVLink metrics on supported systems.

Added a new launch__func_cache_config metric to the Launch Statistics section.

Added new branch efficiency metrics to the Source Counters section, including smsp__sass_average_branch_targets_threads_uniform.pct to replace nvprof's branch_efficiency, as well as instruction-level metrics smsp__branch_targets_threads_divergent, smsp__branch_targets_threads_uniform and branch_inst_executed.

A warning is shown if kernel replay starts staging GPU memory to CPU memory or the file system.

Section and rule files are deployed to a versioned directory in the user's home directory to allow easier editing of those files, and to prevent modifying the base installation.

Removed support for NVLINK (nvl*) metrics due to a potential application hang during data collection. The metrics will be added back in a future version of the driver/tool.

NVIDIA Nsight Compute

Added support for Profile Series. Series allow you to profile a kernel with a range of configurable parameters to analyze the performance of each combination.

Added a new Allocations view to the Resources tool window which shows the state of all current memory allocations.

Added a new Memory Pools view to the Resources tool window which shows the state of all current memory pools.

Added coverage of peer memory to the Memory Chart.

The Source page now shows the number of excessive sectors requested from L1 or L2, e.g. due to uncoalesced memory accesses.

The Source column on the Source page can now be scrolled horizontally.

The kernel duration gpu__time_duration.sum was added as column on the Summary page.

Improved the performance of application replay when not all kernels in the application are profiled.

NVIDIA Nsight Compute CLI

Added a new --app-replay-match option to select the mechanism used for matching kernel instances across application replay passes.

An error is shown if --nvtx-include/exclude are used without --nvtx.

Resolved Issues

The Grid Size column on the Raw page now shows the CUDA grid size like the Launch Statistics section, rather than the combined grid and block sizes.

The Branch Resolving warp stall reason was added to the PC sampling metric groups and the Warp State Statistics section.

The API Stream tool window shows kernel names according to the selected Function Name Mode.

Fixed that an incorrect line could be shown after a heatmap selection on the Source page.

Fixed incorrect metric usage for system memory in the Memory Chart. Previously, all requested memory of L2 from system memory was reported instead of only the portion that missed in L2.

