After clicking “Watch Now” you will be prompted to login or join.


Click “Watch Now” to login or join the NVIDIA Developer Program.


A Partitioned Global Address Space Library for Large GPU Clusters

Akhil Langer, NVIDIA | Sreeram Potluri, NVIDIA |, James S Dinan

GTC 2020

We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink, and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth-bound, and codes like Breadth First Search that have a dynamic communication pattern.

View More GTC 2020 Content