After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
PaddlePaddle with Distributed Training API, Automatic Mixed Precision, and TensorRT Integration
Bai-Cheng Jeng, NVIDIA | Jie Fang, NVIDIA | Daming Lu, Baidu USA
We'll introduce PaddlePaddle (PArallel Distributed Deep LEarning), an easy-to-use, efficient, flexible, and scalable deep-learning platform, which has already deployed to real business scenarios. In training phase, PaddlePaddle provides a high-level API for distributed training named Fleet. It can distribute a training task to GPU cluster. To further increase performance, PaddlePaddle can use mixed precision training through adding few lines of code, which achieves significant speedup by Tensor Core. In inference phase, PaddlePaddle integrates TensorRT to fuse operations, and can infer models in lower precision mode to fully utilize GPU resources. With all the three technologies mentioned above, developers can significantly cut the needed time to train large-scale tasks and deploy models for operation.