Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9321:Distributed Deep Learning with Horovod

Alex Sergeev(Uber Technologies, Inc.)
Learn how to scale distributed training of TensorFlow and PyTorch models with Horovod, a library designed to make distributed training fast and easy to use. Although frameworks like TensorFlow and PyTorch simplify the design and training of deep learning models, difficulties usually arise when scaling models to multiple GPUs in a server or multiple servers in a cluster. We'll explain the role of Horovod in taking a model designed on a single GPU and training it on a cluster of GPU servers.

View the slides (pdf)