GTC 2020: From Training to Inference: Maximizing Resource Usage and Reducing Cost with GPU Virtualization on VMware vSphere
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
From Training to Inference: Maximizing Resource Usage and Reducing Cost with GPU Virtualization on VMware vSphere
Raj Rao, NVIDIA | Uday Kurkure, VMware | Lan Vu, VMware
As machine learning and artificial intelligence are increasingly adopted across all industries, their workload share in data centers is growing. We'll present use cases to optimize the cost and resource of your data center for ML on VMware vSphere with GPU virtualization, especially with NVIDIA GRID. We'll discuss the differences in resource utilization between training and inference, and showcase techniques to maximize the benefits of GPU for your deep-learning workloads. These techniques include sharing GPU by multiple concurrent users or workloads, using GPU scheduling policies, and optimizing for training and inference in cloud environment. We'll demonstrate how we applied these techniques in our real-world ML/AI applications at VMware and how they help us further improve the performance of these applications, enabling real-time analytics while reducing the cost of deployment with the latest Volta/Turing GPUs and NVIDIA GRID.