Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9726:Unified Memory for Data Analytics and Deep Learning

Chirayu Garg(NVIDIA),Nikolay Sakharnykh(NVIDIA)
Unified Memory significantly improves productivity, while explicit memory management often provides better performance. We'll examine performance of Unified Memory applications from key AI domains and describe memory-optimization techniques to find the right balance of productivity and performance when you're developing applications. Unified Memory was designed for data analytics, to keep frequently accessed data in GPU memory. We'll analyze performance of large analytic workloads and review bottlenecks for GPU oversubscription on PCIe and NVLINK systems. We'll also discuss results from our study integrating Unified Memory in PyTorch for training deep neural networks. We found that Unified Memory matches explicit cudaMalloc for workloads that fit on GPU memory. In addition, applications can oversubscribe the GPU, which facilitates using bigger batch sizes or training deeper models.

View the slides (pdf)