After clicking “Watch Now” you will be prompted to login or join.


Click “Watch Now” to login or join the NVIDIA Developer Program.


Overcoming Latency Barriers: Strong Scaling HPC Applications with NVSHMEM

Mathias Wagner, NVIDIA

GTC 2020

For scientific advancement through HPC, ever-increasing simulation capabilities are not the only key to success. Obtaining timely results is often even more important. Reducing the time-to-solution generally requires the application to be strong-scalable. However, scaling up improved single-GPU performance faces many obstacles. We'll show you how to improve the strong-scaling on systems equipped with NVIDIA GPUs. Avoid or hide latencies by exploiting GPU-centric communication with NVSHMEM, an implementation of OpenSHMEM for GPUs. After introducing NVSHMEM, we'll share best practices gathered from using NVSHMEM for QUDA, a library for Lattice QCD on GPUs used by codes as MILC and Chroma. We show results obtained on fat-GPU nodes like DGX-1/2, as well as scaling them to 1,000 GPUs in InfiniBand-connected systems, including Summit.

View More GTC 2020 Content