GTC 2020: A Partitioned Global Address Space Library for Large GPU Clusters
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
A Partitioned Global Address Space Library for Large GPU Clusters
Akhil Langer, NVIDIA | Sreeram Potluri, NVIDIA | firstname.lastname@example.org, James S Dinan
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink, and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth-bound, and codes like Breadth First Search that have a dynamic communication pattern.