GTC Silicon Valley-2019 ID:S91025:Data Loading: the Next Frontier in Scale-out Deep Learning (Presented by Pure Storage)
Emily Watkins(Pure Storage)
In this talk you will learn how to create efficient input pipelines that are tailored to your training data. As number of projects, number of GPUs, and data size increase, there is no one-size-fits-all input pipeline that can keep GPUs fed with data. We will examine the relationship between training throughput and image representation. We'll provide guidance on tradeoffs between pre-processing datasets and in-line data processing, and we'll review results from a distributed training environment with multiple NVIDIA DGX-1s and a Pure Storage FlashBlade to highlight performance impact at scale. Learn how to maximize time to accuracy and, ultimately, time to shipping models.