Anton van den Hengel, director of The University of Adelaide’s Australian Centre for Visual Technologies shares how his research group is working on Visual Question Answering (VQA) which uses deep learning to understand the contents of an image.
“The data has been around for a while, but really the GPU technology coming in and allowing us to extract the value out of this data has been the big breakthrough,” said Professor van den Hengel whose team placed second in the 2016 ImageNet Scene Parsing Challenge.
Below are two example results from Professor van den Hengel’s lab paper, “The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions”. Given an image-question pair, their model generates not only an answer, but also a set of reasons (as text) and visual attention maps. The colored words in the question have Top-3 weights, ordered as red, blue and cyan.
Share your GPU-accelerated science with us at http://nvda.ws/2cpa2d4 and with the world on #ShareYourScience.
Watch more scientists and researchers share how accelerated computing is benefiting their work at http://nvda.ws/2dbscA7
Related resources
- DLI course: Getting Started with Image Segmentation
- GTC session: The Visionaries: A Cross-Industry Exploration of Computer Vision
- GTC session: Mitigating Spurious Correlations for Medical Image Classification via Natural Language Concepts
- GTC session: Computer Vision for Rare Disease Genomic Medicine
- NGC Containers: MATLAB
- Webinar: Transforming Warehouse Operation Management Using Computer Vision and Digital Twins