Updates in 2021.3.1

Resolved Issues
  • Fixed that kernels with the same name and launch configuration were in some scenarios associated with the wrong profiling results during application replay.
  • Fixed an issue with binary forward compatibility of the report format.
  • Fixed an issue with applications calling into the CUDA API during process teardown.
  • Fixed an issue profiling application using pre-CUDA API 3.1 contexts.
  • Fixed a crash when resolving files on the Source page.
  • Fixed that opening reports with large embedded CUBINs would hang the UI.
  • Fixed an issue with remote profiling on a target where the UI is already launched.

Updates in 2021.3.0

  • Added support for the CUDA toolkit 11.5.
  • Added a new rule for detecting inefficient memory access patterns in the L1TEX cache and L2 cache.
  • Added a new rule for detecting high usage of system or peer memory.
  • Added new IAction::sass_by_pc function to the the NvRules API.
  • The Python-based report interface is now available for Windows and MacOS hosts, too.
  • Added Hierarchical Roofline section files in a new "roofline" section set.
  • Added support for collecting CPU call stack information.
NVIDIA Nsight Compute
  • Added support for new remote profiling SSH connection and authentication options as well as local SSH configuration files.
  • '
  • Added an Occupancy Calculator which can be opened directly from a profile report or as a new activity. It offers feature parity to the CUDA Occupancy Calculator spreadsheet.
  • Added new Baselines tool window to manage (hide, update, re-order, save/load) baseline selections.
  • The Source page views now support multi-line/cell selection and copy/paste. Different colors are used for highlighting selections and correlated lines.
  • The search edit on the Source page now supports Shift+Enter to search in reverse direction.
  • The Memory Workload Analysis Chart can be configured to show throughput values instead of transferred bytes.
  • The Profile activity now supports the --devices option.
  • The NVLink Topology diagram displays per NVLink metrics.
  • Added a new tool window showing the CPU call stack at the location where the current thread was suspended during interactive profiling activities.
  • If enabled, the Call Stack / NVTX page of the profile report shows the captured CPU call stack for the selected kernel launch.
NVIDIA Nsight Compute CLI
  • Added support for printing source/metric content with the new --page source and --print-source command line options.
  • Added new option --call-stack to enable collecting the CPU call stack for every profiled kernel launch.
Resolved Issues
  • Fixed that memory_* metrics could not be collected with the --metrics option.
  • Fixed that selection and copy/paste was not supported for section header tables on the Details page.
  • Fixed issues with the Source page when collapsing the content.
  • Fixed that the UI could crash when applying rules to a new profile result.
  • Fixed that PC Sampling metrics were not available for Profile Series.
  • Fixed that local profiling did not work if no non-loopback address was configured for the system.

For a complete overview of all NVIDIA Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.

NVIDIA® Nsight™ Compute 2021.3 is available for download under the NVIDIA Registered Developer Program.

 Download 2021.3.1   Download 2021.3.0   Documentation