Nsight Compute 2020.1 - New Features

Memory Workload Analysis showing
NVIDIA Ampere Architecture
Asynchronous Copy to Shared Memory

Memory Workload Analysis showing
NVIDIA Ampere Architecture
Compute Data Compression

Roofline Analysis comparing profiling runs of code optimized to near the GPU's full potential (red) and the baseline (purple).

Found in CUDA Toolkit 11.0 Update 1

The NVIDIA Nsight Compute installer for Mac is now code-signed and notarized.

Disabled the creation of the Python cache when executing rules to avoid permission issues and signing conflicts.

Fixed the launcher script of the NVIDIA Nsight Compute CLI to no longer fail if uname -p is not available.

Fixed the API parameter capture for function cuDeviceGetLuid.

Found in CUDA Toolkit 11.0 GA

Metrics passed to --metrics on the NVIDIA Nsight Compute CLI or in the respective Profile activity option are automatically expanded to all first-level sub-metrics if required. See the documentation on --metrics for more details.
Added new rules for detecting inefficiencies of using the Compute Data Compression on the NVIDIA Ampere architecture.
The version of the NVIDIA Nsight Compute target collecting the results is shown in the Session page.
Added new launch__grid_dim_[x,y,z] and launch__block_dim_[x,y,z] metrics.

Documented the breakdown: metrics prefix.
Fixed handling of escaped domain delimiters in NVTX filter expressions.
Fixed issues with the occupancy charts for small block sizes.
Fixed an issue when choosing a default report page in the options dialog.
Fixed that the scroll bar could overlap the content when exporting the report page as an image.

Found in CUDA Toolkit 11.0 RC

Added support for the NVIDIA GA100/SM 8.x GPU architecture
Expanded platform support to include Arm SBSA (server based system architectures)
Support for CUDA Toolkit 11.0 was added
Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section
Added support for report name placeholders %p, %q, %i and %h
The Kernel Profiling Guide was added to the documentation
The Special Configurations sections was added to the documentation, detailing support for NVIDIA Ampere architecture's Multi-Instance GPU (MIG)
Added support for Visual Studio integration (windows only)

Added support for roofline analysis charts
NVIDIA Ampere architecture enhancements
- Memory Workload Analysis Report now shows Compute Data Compression ratio and amounts
- Memory Workload Analysis Report now shows Asynchronous Copy to shared memory
Added linked hot spot tables in section bodies to indicate performance problems in the source code
Added section navigation links in rule results to quickly jump to the referenced section
Added a new option to select how kernel names are shown in the UI
Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements.
Memory tables now show the metric name as a tooltip
Source resolution now takes into account file properties when selecting a file from disk
Results in the profile report can now be filtered by NVTX range
The Source page now supports collapsing views even for single files
The UI shows profiler error messages as dismissible banners for increased visibility
Improved the baseline name control in the profiler report header
The UI command was renamed from nv-nsight-cu to ncu-ui. Old names remain for backwards compatibility.

The CLI command was renamed from nv-nsight-cu-cli to ncu. Old names remain for backwards compatibility.
Queried metrics on GV100 and newer chips are sorted alphabetically
Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock.
Resolved Issues
More C++ kernel names can be properly demangled
Fixed a free(): invalid pointer error when profiling applications using pytorch > 19.07
Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks (--smpiargs="-gpu")
Fixed that the first kernel instruction was missed when computing sass__inst_executed_per_opcode
Reduced surplus DRAM write traffic created from flushing caches during kernel replay
The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs
Profile reports now scroll properly on MacOS when using a trackpad
Relative output filenames for the Profile activity now use the document directory, instead of the current working directory
Fixed path expansion of ~ on Windows
Memory access information is now shown properly for RED assembly instructions on the Source page
Fixed that user PYTHONHOME and PYTHONPATH environment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.

For a complete overview of all NVIDIA Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.

NVIDIA® Nsight™ Compute 2020.1 is available for download under the NVIDIA Registered Developer Program.

Videos