GTC-DC 2019: GPU and AI as a Service: Driving Productivity and Increasing Utilization

Yaron Haviv, Iguazio
gtc-dc 2019
We’ll demonstrate how to use GPUs and AI to build machine learning applications more easily. It’s not the data science that’s hard, but all the operations around it: deploying tools, integrating hardware, creating data and machine learning frameworks, running jobs at scale, and reproducing results. GPUs accelerate performance but pose problems such as resource sharing, software dependencies, and data bottlenecks. In a cloud-native era, data scientists want a GPU-powered, open source machine learning platform as a service such as AWS Sagemaker or Google AI, without vendor lock-ins or on-premises software. We’ll show how to integrate Kubernetes, KubeFlow, high-speed data layers, and GPU-powered servers to build self-service, multi-user machine learning platforms. We’ll also demonstrate how to pool GPUs to maximize utilization and increase scalability, use RAPIDS for 10x faster data processing, and integrate GPUs into the rest of the machine learning stack.