GTC 2020: Distributed Deep Learning with Horovod

After clicking “Watch Now” you will be prompted to login or join.

Click “Watch Now” to login or join the NVIDIA Developer Program.

WATCH NOW

Distributed Deep Learning with Horovod

Travis Addair, Uber Technologies

GTC 2020

We'll show how to scale distributed training of TensorFlow, PyTorch, and MXNet models with Horovod, a library designed to make distributed training fast and easy to use. We'll explain the role of Horovod in taking a model designed on a single GPU and training it on a cluster of GPU servers with just a few additional lines of Python code. We'll also explore how Horovod has been used across the industry to scale training to hundreds of GPUs, and the techniques that are used to maximize training performance. Although frameworks like TensorFlow and PyTorch simplify the design and training of deep learning models, difficulties usually arise when scaling models to multiple GPUs in a server or multiple servers in a cluster.

View More GTC 2020 Content