Google released the latest version of their automatic image captioning model that is more accurate, and is much faster to train compared to the original system.
“The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief (a system Google previously used for generating image captions) on an NVIDIA K20 GPU, meaning that total training time is just 25 percent of the time previously required,” Chris Shallue, Software Engineer of the Google Brain Team wrote in a blog post.
Using CUDA and the TensorFlow deep learning framework, Google trains Show and Tell by letting it take a look at images and captions that people wrote for those images. Sometimes, if the model thinks it sees something going on in a new image that’s exactly like a previous image it has seen, it falls back on the caption for the caption for that previous image. But at other times, Show and Tell is able to come up with original captions. “Moreover,” Shallue wrote, “it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.”
The initial training phase took nearly two weeks on a single Tesla K20 GPU, but they mention it would be 10 times slower if you were to run the code on a CPU.
Read more >
Google Open-Sources Image Captioning Intelligence

Sep 27, 2016
Discuss (0)
AI-Generated Summary
- Google released an updated version of its automatic image captioning model, which is more accurate and faster to train than the original system.
- The new model, trained using TensorFlow and CUDA on an NVIDIA K20 GPU, achieves the same level of accuracy as the previous system but with a significantly reduced training time of 25 percent.
- The model generates new captions by learning from similar scenes in the training set and can come up with original captions or fall back on captions from previously seen images.
AI-generated content may summarize information incompletely. Verify important information. Learn more