GTC-DC 2019: Multi-Node Multi-GPU Machine Learning with RAPIDS cuML

Corey Nolet, NVIDIA; Joe Eaton, NVIDIA
gtc-dc 2019
We’ll discuss the RAPIDS ecosystem, which is accelerating the data science workflow by keeping data and computations on GPUs. We’re able to go from ingestion to insights more quickly, with larger workloads. Within RAPIDS, cuML provides a sklearn-like application programming interface (API) and cuGraph a NetworkX API of GPU-accelerated algorithms. While over 100x of speedup is possible on a single GPU, the scale is bounded by the device’s available memory space. By scaling to multiple GPUs spread across multiple nodes, cuML and cuGraph can increase speedup even further while providing avenues to scale up and out. We’ll focus on how we enabled the training and inference of machine learning and graph models on multiple nodes within cuML and cuGraph, and provide an architectural overview of our communications API, which is enabling GPU-to-GPU direct memory transfers. We’ll conclude with examples and benchmarks.