After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
High Performance Distributed Deep Learning: A Beginner's Guide
Ammar Ahmad Awan, Ohio State University | Dhabaleswar K (DK) Panda, Ohio State University | Hari Subramoni, Ohio State University
Learn the current wave of advances in AI and HPC technologies to improve the performance of deep neural network training on NVIDIA GPUs. We'll discuss many exciting challenges and opportunities for HPC and AI researchers. Several modern DL frameworks (Caffe, TensorFlow, CNTK, PyTorch, and others) that offer ease-of-use and flexibility to describe, train, and deploy various types of DNN architectures have emerged. We'll provide an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Most DL frameworks have utilized a single GPU to accelerate the performance of DNN training/inference. However, approaches to parallelize training are being actively explored. We'll highlight new challenges for message-passing interface runtimes to efficiently support DNN training, and how efficient communication primitives in MVAPICH2-GDR can support scalable DNN training. Finally, we'll discuss how we scale training of ResNet-50 using TensorFlow to 1,536 GPUs for MVAPICH2-GDR.