IBM Research unveiled a “Distributed Deep Learning” (DDL) library that enables cuDNN-accelerated deep learning frameworks like TensorFlow, Caffe, Torch and Chainer to scale to tens of IBM servers leveraging hundreds of GPUs.
“With the DDL library, it took us just 7 hours to train ImageNet-22K using ResNet-101 on 64 IBM Power Systems servers that have a total of 256 NVIDIA P100 GPU accelerators in them,” mentioned Sumit Gupta, VP, HPC, AI & Machine Learning at IBM Cognitive Systems. “16 days down to 7 hours changes the workflow of data scientists. That’s a 58x speedup!”
According to the researcher’s paper, the team achieved deep learning records in image recognition accuracy and training times when using the new library and 256 GPUs.
A technical preview of DDL is available in version 4 of IBM’s PowerAI enterprise deep learning software, which makes this cluster scaling feature available to any organization using deep learning for training their AI models.
Read more >
Scaling TensorFlow and Caffe to 256 GPUs
Aug 08, 2017
Discuss (0)

AI-Generated Summary
- IBM Research released a Distributed Deep Learning (DDL) library that allows deep learning frameworks to scale across multiple servers and GPUs.
- The DDL library enabled the training of ImageNet-22K using ResNet-101 on 64 IBM Power Systems servers with 256 NVIDIA P100 GPU accelerators in just 7 hours, achieving a 58x speedup.
- A technical preview of DDL is available in IBM's PowerAI enterprise deep learning software version 4, making it accessible to organizations using deep learning for AI model training.
AI-generated content may summarize information incompletely. Verify important information. Learn more