MPI

Nov 15, 2022

Scaling VASP with NVIDIA Magnum IO

You could make an argument that the history of civilization and technological advancement is the history of the search and discovery of materials. Ages are...

22 MIN READ

Jun 28, 2021

Accelerating Scientific Applications in HPC Clusters with NVIDIA DPUs Using the MVAPICH2-DPU MPI Library

High-performance computing (HPC) and AI have driven supercomputers into wide commercial use as the primary data processing engines enabling research, scientific...

7 MIN READ

Jun 23, 2021

Achieve up to 75% Performance Improvement for Communication Intensive HPC Applications with NVTAGS

Many GPU-accelerated HPC applications spend a substantial portion of their time in non-uniform, GPU-to-GPU communications. Additionally, in many HPC systems,...

2 MIN READ

Figure 5: Ring order of GPUs in PCIe tree.

Apr 07, 2016

Fast Multi-GPU collectives with NCCL

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. But in...

10 MIN READ

May 05, 2015

GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the...

5 MIN READ

Oct 07, 2014

Benchmarking GPUDirect RDMA on Modern Server Platforms

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI...

13 MIN READ

Jun 19, 2014

CUDA Pro Tip: Profiling MPI Applications

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it's necessary to identify the MPI rank where...

4 MIN READ

Mar 27, 2013

Benchmarking CUDA-Aware MPI

I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I...

8 MIN READ

Mar 13, 2013

An Introduction to CUDA-Aware MPI

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed processes that is commonly used in HPC to build...

11 MIN READ