Tag: MPI

Artificial Intelligence

Fast Multi-GPU collectives with NCCL

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. 10 MIN READ
Accelerated Computing

GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the… 5 MIN READ
Accelerated Computing

Benchmarking GPUDirect RDMA on Modern Server Platforms

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI… 13 MIN READ
Accelerated Computing

CUDA Pro Tip: Profiling MPI Applications

Use nvprof and NVTX to profile your MPI+CUDA application. 4 MIN READ
Accelerated Computing

Benchmarking CUDA-Aware MPI

I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I… 8 MIN READ
Accelerated Computing

An Introduction to CUDA-Aware MPI

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed processes that is commonly used in HPC to build… 11 MIN READ