Scaling Keras Model Training to Multiple GPUs

Keras is a powerful deep learning meta-framework which sits on top of existing frameworks such as TensorFlow and Theano. Keras is highly productive for developers; it often requires 50% less code to define a model than native APIs of deep learning frameworks require. This productivity has made it very popular as a university and MOOC teaching tool, and as a rapid prototyping platform for applied researchers and developers.
Unfortunately, Keras is quite slow in terms of single-GPU training and inference time (regardless of the backend). It is also hard to get it to work on multiple GPUs without breaking its framework-independent abstraction.
Can this be improved, leveraging Keras’s high-level API, while still achieving good single-GPU performance and multi-GPU scaling? It turns out that the answer is yes, thanks to the MXNet backend for Keras, and MXNet’s efficient data pipeline. Last week, the MXNet community introduced a release candidate for MXNet v0.11.0 with support for Keras v1.2.

ResNet-50 training throughput (images per second) comparing Keras using the MXNet backend (green bars) to a native MXNet implementation (blue bars).

In a new NVIDIA Developer Blog post, Marek Kolodziej shows how to use Keras with the MXNet backend to achieve high performance and excellent multi-GPU scaling. As a motivating example, I’ll show you how to build a fast and scalable ResNet-50 model in Keras.
Read more >

Scaling Keras Model Training to Multiple GPUs

Tags

About the Authors

Scaling Keras Model Training to Multiple GPUs

Tags

About the Authors

Comments

Related posts

New Optimizations To Accelerate Deep Learning Training on NVIDIA GPUs

SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs

NVIDIA DGX-2 AI Supercomputer and New Tesla T4 Set Image Recognition Records for Training and Inference

Speeding Up Deep Learning Training with NVIDIA V100 Tensor Core GPUs in the AWS Cloud

Scaling Keras Model Training to Multiple GPUs

Related posts

NVIDIA ACE Adds Open Source Qwen3 SLM for On-Device Deployment in PC Games

Just Released: NVIDIA HPC SDK v25.9

Just Released: Warp 1.9

Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX

Join Us at We Are Developers World Congress 2025