GTC Silicon Valley-2019 ID:S9708:Strong Scaling HPC Applications: Best Practices with a Lattice QCD case study

Kate Clark(NVIDIA),Mathias Wagner(NVIDIA)
We'll share what we learned by running the QUDA open source library for lattice quantum chromodynamics (LQCD) on modern GPU systems, which provide myriad hardware and software techniques to improve strong scaling. We ran QUDA which provides GPU acceleration for LQCD applications like MILC and Chromaon on six generations of GPUs with various network and node configurations, including IBM POWER9- and x86-based systems and NVLink- and NVSwitch-based systems. Based on those experiences, we'll discuss best practices for scaling to hundreds and even thousands of GPUs. We'll also cover peer-to-peer memory access, GPUDirect RDMA, and NVSHMEM, as well as the techniques such as auto-tuning kernel launch configurations.

