After clicking “Watch Now” you will be prompted to login or join.


Click “Watch Now” to login or join the NVIDIA Developer Program.


High Performance Distributed Deep Learning: A Beginner's Guide

Ammar Ahmad Awan, Ohio State University | Dhabaleswar K (DK) Panda, Ohio State University | Hari Subramoni, Ohio State University

GTC 2020

Learn the current wave of advances in AI and HPC technologies to improve the performance of deep neural network training on NVIDIA GPUs. We'll discuss many exciting challenges and opportunities for HPC and AI researchers. Several modern DL frameworks (Caffe, TensorFlow, CNTK, PyTorch, and others) that offer ease-of-use and flexibility to describe, train, and deploy various types of DNN architectures have emerged. We'll provide an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Most DL frameworks have utilized a single GPU to accelerate the performance of DNN training/inference. However, approaches to parallelize training are being actively explored. We'll highlight new challenges for message-passing interface runtimes to efficiently support DNN training, and how efficient communication primitives in MVAPICH2-GDR can support scalable DNN training. Finally, we'll discuss how we scale training of ResNet-50 using TensorFlow to 1,536 GPUs for MVAPICH2-GDR.

View More GTC 2020 Content