Researchers from Google and Stanford have taught their computer vision model to detect the most important person in a multi-person video scene – for example, who the shooter is in a basketball game which typically contains dozens or hundreds of people in a scene.
Using 20 Tesla K40 GPUs and the cuDNN-accelerated Tensorflow deep learning framework to train their recurrent neural network on 257 NCAA basketball games from YouTube, an attention mask selects which of the several people are most relevant to the action being performed, then tracks relevance of each object as time proceeds. The team published a paper detailing more of their work.
Over time the system can identify not only the most important actor, but potential important actors and the events with which they are associated – such as, the ability to understand the player going up for a layup could be important, but that the most important player is the one who then blocks the shot.
Read more >>
Teaching an AI to Detect Key Actors in Multi-person Videos
Jul 06, 2016
Discuss (0)
Related resources
- DLI course: Building Real-Time Video AI Applications
- GTC session: Harnessing Generative AI and Large Language Model With Vision AI Agents
- GTC session: Leveling Up Game Worlds: Smart NPCs Driving Dynamic Narratives
- GTC session: Generative AI Theater: Building a Multimodal Retrieval Augmented Generation Chatbot
- NGC Containers: ACE Agent Sample Frontend
- Webinar: Bringing Generative AI to Life with NVIDIA Jetson