GPU-Trained System Understands Movies

Researchers from Karlsruhe Institute of Tech, MIT and University of Toronto published MovieQA, a dataset that contains 7702 reasoning questions and answers from 294 movies. Their innovative dataset and accuracy metrics provide a well-defined challenge for question/answer machine learning algorithms.
The questions range from simpler ‘Who’ did ‘What’ to ‘Whom’ that can be solved by computer vision alone, to ‘Why’ and ‘How’ something happened in the movie, questions that can only be solved by exploiting both the visual information and dialogs.

MovieQA is unique in that it contains multiple sources of information – full-length movies, plot synopses, subtitles, scripts and DVS (a service that narrates moves scenes to the visually impaired).
With the need to scale to large vocabulary data sets, they relied on a TITAN Black GPU for their overwhelming amount of training data.
In early 2016, the researchers plan to create an online benchmark that will have 15,000 questions and 75,000 answers which will encourage other to contribute.
Read the research paper >>

GPU-Trained System Understands Movies

Tags

About the Authors

GPU-Trained System Understands Movies

Tags

About the Authors

Comments

Related posts

Vision Language Model Prompt Engineering Guide for Image and Video Understanding

Build a Video Search and Summarization Agent with NVIDIA AI Blueprint

Build an Agentic Video Workflow with Video Search and Summarization

Building a Question and Answering Service Using Natural Language Processing with NVIDIA NGC and Google Cloud

Share Your Science: Extracting Information from Images

Related posts

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight

Bringing AI Closer to the Edge and On-Device with Gemma 4

Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads

How Centralized Radar Processing on NVIDIA DRIVE Enables Safer, Smarter Level 4 Autonomy

Designing Protein Binders Using the Generative Model Proteina-Complexa