Anton van den Hengel, director of The University of Adelaide’s Australian Centre for Visual Technologies shares how his research group is working on Visual Question Answering (VQA) which uses deep learning to understand the contents of an image.
“The data has been around for a while, but really the GPU technology coming in and allowing us to extract the value out of this data has been the big breakthrough,” said Professor van den Hengel whose team placed second in the 2016 ImageNet Scene Parsing Challenge.
Below are two example results from Professor van den Hengel’s lab paper, “The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions”. Given an image-question pair, their model generates not only an answer, but also a set of reasons (as text) and visual attention maps. The colored words in the question have Top-3 weights, ordered as red, blue and cyan.
Share your GPU-accelerated science with us at http://nvda.ws/2cpa2d4 and with the world on #ShareYourScience.
Watch more scientists and researchers share how accelerated computing is benefiting their work at http://nvda.ws/2dbscA7
Related resources
- GTC session: Fundamentals of Universal Scene Description* (Spring 2023)
- GTC session: Modeling and Simulation Meet AI: AI for Science and Art (Spring 2023)
- GTC session: Synthetic Data Generation for Training Object Detection Models* (Spring 2023)
- SDK: MONAI Label
- Webinar: Inception Workshop 101 - Getting Started with Data Science
- Webinar: Inception Workshop 101 - Getting Started with Vision AI