NVIDIA Nsight Compute: Roofline and NVIDIA Ampere GPU Architecture Analysis
This demo shows the latest CUDA kernel analysis capabilities in NVIDIA Nsight Compute, including the popular Roofline Analysis Method and a new feature for the NVIDIA Ampere GPU Architecture. Specifically, we’ll demonstrate profiling the hardware-supported asynchronous data copy feature, which can boost the performance of workloads that are able to take advantage of it.
NVIDIA Nsight Compute: Feature Spotlight – Application Replay
This demo introduces the new Application Replay capability in NVIDIA #NsightCompute. This feature opens the door for new workloads and workflows to take advantage of the powerful #CUDA kernel profiling capabilities in Nsight Compute. You’ll learn how Nsight Compute replays kernels to get accurate performance data and how Application Replay can be used to improve performance and unlock new options for CUDA kernel analysis.
NVIDIA Nsight Systems: Analyzing NCCL Usage with NVIDIA Nsight Systems
NVIDIA Nsight Systems now includes support for tracing NCCL (NVIDIA Collective Communications Library) usage in your CUDA application. This enables users to identify NCCL activity on the CPU timeline and correlate it to the associated GPU CUDA kernels and memory copies. Equipped with this information, users can validate and improve NCCL usage by identifying CPU and GPU cold spots and hot spots. Nsight Systems is the go-to profiling tool for gaining a holistic view of CUDA applications.
NVIDIA Nsight Systems: Optimizing CUDA Memory Allocations Using NVIDIA Nsight Systems
NVIDIA Nsight Systems now traces CUDA memory allocation to ensure optimal memory usage. Effective memory management is key to ensuring efficient application performance. With this information, users can ensure that their application is reclaiming available memory to avoid any out-of-memory starvation or stalls. Nsight Systems is the premier profiling tool for gaining a holistic view of CUDA applications.