Scaling TensorFlow and Caffe to 256 GPUs

IBM Research unveiled a “Distributed Deep Learning” (DDL) library that enables cuDNN-accelerated deep learning frameworks like TensorFlow, Caffe, Torch and Chainer to scale to tens of IBM servers leveraging hundreds of GPUs.
“With the DDL library, it took us just 7 hours to train ImageNet-22K using ResNet-101 on 64 IBM Power Systems servers that have a total of 256 NVIDIA P100 GPU accelerators in them,” mentioned Sumit Gupta, VP, HPC, AI & Machine Learning at IBM Cognitive Systems. “16 days down to 7 hours changes the workflow of data scientists. That’s a 58x speedup!”
According to the researcher’s paper, the team achieved deep learning records in image recognition accuracy and training times when using the new library and 256 GPUs.

A technical preview of DDL is available in version 4 of IBM’s PowerAI enterprise deep learning software, which makes this cluster scaling feature available to any organization using deep learning for training their AI models.
Read more >

Scaling TensorFlow and Caffe to 256 GPUs

Tags

About the Authors

Scaling TensorFlow and Caffe to 256 GPUs

Tags

About the Authors

Comments

Related posts

Scaling Out the Deep Learning Cloud Efficiently

Fujitsu Breaks ImageNet Record with V100 Tensor Core GPUs

SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs

Speeding Up Deep Learning Training with NVIDIA V100 Tensor Core GPUs in the AWS Cloud

Facebook Trains ImageNet in 1 Hour

Related posts

Making GPU Clusters More Efficient with NVIDIA Data Center Monitoring Tools

Less Coding, More Science: Simplify Ocean Modeling on GPUs With OpenACC and Unified Memory

Just Released: NVIDIA HPC SDK v25.7

Just Released: NVIDIA PhysicsNeMo v25.06

NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support