Technical Walkthrough 0

Accelerating Scientific Applications in HPC Clusters with NVIDIA DPUs Using the MVAPICH2-DPU MPI Library

HPC and AI have driven supercomputers into wide commercial use as the primary data processing engines enabling research, scientific discoveries, and product development. These systems can carry comple... 7 MIN READ
News 0

Achieve up to 75% Performance Improvement for Communication Intensive HPC Applications with NVTAGS

NVTAGS automates intelligent GPU assignment by profiling HPC applications and launching them with a custom GPU assignment tailored to an application and system to minimize communication costs. 2 MIN READ
Figure 5: Ring order of GPUs in PCIe tree.
Technical Walkthrough 0

Fast Multi-GPU collectives with NCCL

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. 10 MIN READ
GPU Pro Tip
Technical Walkthrough 0

GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the… 5 MIN READ
Technical Walkthrough 0

Benchmarking GPUDirect RDMA on Modern Server Platforms

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI… 13 MIN READ
GPU Pro Tip
Technical Walkthrough 0

CUDA Pro Tip: Profiling MPI Applications

Use nvprof and NVTX to profile your MPI+CUDA application. 4 MIN READ