GTC 2020: Distributed Machine Learning on Virtualized Servers
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Distributed Machine Learning on Virtualized Servers
Luke Wignall, NVIDIA | Mohan Potheri, VMware | Boris Kovalev, Mellanox
Horovod is a distributed machine learning platform that can leverage GPUs for deep learning. We'll talk about a joint project between NVIDIA, Mellanox, and VMware to create a high-performance platform leveraging NVIDIA vCompute Server, Mellanox-based high speed networking, and vSphere PVRDMA. We'll compare the results of common benchmarks that ran with and without PVRDMA. We'll also discuss a reference architecture for leveraging vCompute server for ML.