GTC 2020: AdaSum: Adaptive Summation of Gradients

After clicking “Watch Now” you will be prompted to login or join.

Click “Watch Now” to login or join the NVIDIA Developer Program.

WATCH NOW

AdaSum: Adaptive Summation of Gradients for Deep Learning

Saeed Maleki, Microsoft

GTC 2020

AdaSum is a new algorithm to perform parallelized gradient aggregation based on the notion of sound model combiners. It brings accuracy of the parallel /distributed Stochastic Gradient Descent closer to the accuracy of sequential Stochastic Gradient Descent, yielding a faster convergence. We've added AdaSum as a new gradient aggregation option for Horovod and are currently working with the Horovod team in pushing a PR. The algorithm uses different communication strategies based on the underneath-hardware configurations — such as GPU, NVLink, and network topologies — optimizing for maximum bandwidth utilization. We'll discuss AdaSum in detail, go over an example with Horovod, and provide performance benchmarks demonstrating the total training-time savings for different deep-learning models.

View More GTC 2020 Content