After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Tuning GPU Server for DL Performance
Frank Han, Dell | Rengan Xu, Dell
MLPerf training benchmark is a software suite for measuring how fast systems can train models to a target quality metric. Its version 0.6 has good coverage of deep-learning models in image classification, object detection, translation, and reinforcement learning. We'll use those subtests to demonstrate how different hardware configurations (CPU core counts vs frequency, memory frequency 2666 vs 2933Mhz, PCIe vs NVLink) and storage (local SSD, U.2 NVMe, Isilon and Lustre) impacts those DL training workloads. We'll also discuss our work to characterize MLPerf benchmark performance using profiling tools (GPU, CPU, memory, and I/O), our hyperparameter-tuning work (batch size, learning rate, SGD optimizer), software environments study (OS versions, CUDA drivers, docker versions, NCCL P2P levels, NCCL tree vs ring, etc.) on MLPerf performance of both single and distributed systems.