GTC 2020: Overcoming Latency Barriers: Strong Scaling HPC Applications with NVSHMEM
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Overcoming Latency Barriers: Strong Scaling HPC Applications with NVSHMEM
Mathias Wagner, NVIDIA
For scientific advancement through HPC, ever-increasing simulation capabilities are not the only key to success. Obtaining timely results is often even more important. Reducing the time-to-solution generally requires the application to be strong-scalable. However, scaling up improved single-GPU performance faces many obstacles. We'll show you how to improve the strong-scaling on systems equipped with NVIDIA GPUs. Avoid or hide latencies by exploiting GPU-centric communication with NVSHMEM, an implementation of OpenSHMEM for GPUs. After introducing NVSHMEM, we'll share best practices gathered from using NVSHMEM for QUDA, a library for Lattice QCD on GPUs used by codes as MILC and Chroma. We show results obtained on fat-GPU nodes like DGX-1/2, as well as scaling them to 1,000 GPUs in InfiniBand-connected systems, including Summit.