Google Open-Sources Image Captioning Intelligence

Google released the latest version of their automatic image captioning model that is more accurate, and is much faster to train compared to the original system.
“The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief (a system Google previously used for generating image captions) on an NVIDIA K20 GPU, meaning that total training time is just 25 percent of the time previously required,” Chris Shallue, Software Engineer of the Google Brain Team wrote in a blog post.
Using CUDA and the TensorFlow deep learning framework, Google trains Show and Tell by letting it take a look at images and captions that people wrote for those images. Sometimes, if the model thinks it sees something going on in a new image that’s exactly like a previous image it has seen, it falls back on the caption for the caption for that previous image. But at other times, Show and Tell is able to come up with original captions. “Moreover,” Shallue wrote, “it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.”

google-show-and-tell-caption3c — The model generates a completely new caption using concepts learned from similar scenes in the training set.

The initial training phase took nearly two weeks on a single Tesla K20 GPU, but they mention it would be 10 times slower if you were to run the code on a CPU.
Read more >

Google Open-Sources Image Captioning Intelligence

Tags

About the Authors

Google Open-Sources Image Captioning Intelligence

Tags

About the Authors

Comments

Related posts

Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries

NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0

NVIDIA Releases Updates to CUDA-X AI Software

Announcing Megatron for Training Trillion Parameter Models and NVIDIA Riva Availability

NVIDIA Slashes BERT Training and Inference Times

Related posts

Build Accelerated, Differentiable Computational Physics Code for AI with NVIDIA Warp

Validate Kubernetes for GPU Infrastructure with Layered, Reproducible Recipes

Build Next-Gen Physical AI with Edge‑First LLMs for Autonomous Vehicles and Robotics

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

NVIDIA RTX Innovations Are Powering the Next Era of Game Development