Today, Facebook introduced a new feature that automatically generates text descriptions of pictures using advanced object recognition technology.
Until now, people using screen readers would only hear the name of the person who shared the photo, followed by the term “photo” when they came upon an image in News Feed. Now they will get a richer description of what’s in a photo. For instance, someone could now hear, “Image may contain three people, smiling, outdoors.”
The Facebook researchers noted that it took nearly ten months to roll the feature out publicly, as they had to train their deep learning models to recognize more than just the people in the images. For instance, since people mostly care about who is in the photo and what they are doing, but sometimes the background of the photo is what makes it interesting or significant.
While that may be intuitive to humans, it is quite challenging to teach a machine to provide as much useful information as possible while acknowledging the social context.
Their neural network models were trained on a million parameters, but they have carefully selected a set of about 100 concepts based on prominence in photos as well as the accuracy of the visual recognition system. They also avoided concepts that had very specific meanings like smiling, jewelry, cars, and boats. Currently, they are ensuring their object detection algorithm on the objects have a minimum precision rate of 0.8.
Read more >>
Artificial Intelligence Helps the Blind ‘See’ Facebook
Apr 05, 2016
Discuss (0)
Related resources
- DLI course: Building a Brain in 10 Minutes
- GTC session: Bringing Advanced AI and Navigation into Smart Glasses that Empower the Blind
- GTC session: Revolutionizing Vision AI: From 2D to 3D Worlds
- GTC session: Reward Fine-Tuning for Faster and More Accurate Unsupervised Object Discovery
- NGC Containers: retail-shopping-advisor-chatbot-service
- NGC Containers: retail-shopping-advisor-frontend-service