After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Accelerating Large-Scale GW Calculations in Material Science
Charlene Yang, NERSC, Lawrence Berkeley National Laboratory | Mauro Del Ben, CRD, LBNL
Learn the balancing act of porting a large-scale HPC code to modern GPUs, where a plethora of architectural characteristics can both accelerate and limit performance. We'll showcase various techniques used to accelerate the material science code BerkeleyGW on NVIDIA GPUs targeting large-scale simulations with thousands of atoms, matrices of up to 1 million by 1 million, and reductions of thousands of billions of numbers. These techniques include the use of cuBLAS and cuFFT, pinned memory, streams, batched operations, shared memory, and the overlapping of message-passing interface communication and GPU computation. Excellent strong scaling and weak scaling are observed on thousands of Volta GPUs, and a 16x improvement is obtained on FLOPs/Watt efficiency compared to the CPU-only implementation.