Algorithm Achieves Better Accuracy Than Humans at Reading Lips

Researchers at the University of East Anglia in the UK developed an algorithm that is able to interpret mouthed words with a greater degree of accuracy than human lip readers.
Using Tesla K80 GPUs, the researchers trained a deep learning model to recognize mouth shapes corresponding to certain sounds as they are spoken, without any audio input cues at all.
“We’re looking at visual cues and saying how do they vary? We know they vary for different people. How are they using them? What’s the differences? And can we actually use that knowledge in this particular training method for our model? And we can,” says Dr. Helen Bear who created the visual speech recognition system as part of her PhD, along with Prof Richard Harvey of UEA’s School of Computing Sciences.

GPU lip reading — Examples of features captured for improving lip reading accuracy. The green marks relate to key points used in Active Appearance Models when tracking a speaker’s face.

According to Dr. Bear, the core challenge is that humans make more sounds than distinct visual cues. For example, there are several sounds with confusingly similar lip shapes such as ‘/p/,’ ‘/b/,’ and ‘/m/’ — all of which typically cause difficulties for human lip readers. UEA’s visual speech model is able to more accurately distinguish between these visually similar lip shapes.
This technology may one day help people who have hearing and speech impairments, generate audio for video-only-security video footage or enhance poor audio quality on mobile for video calls.
Read more >>

Algorithm Achieves Better Accuracy Than Humans at Reading Lips

Tags

About the Authors

Algorithm Achieves Better Accuracy Than Humans at Reading Lips

Tags

About the Authors

Comments

Related posts

New GAN Can Lipread and Synthesize Speech

Generating Character Animations from Speech with AI

MIT Develops AI That Handles Speech and Object Recognition All at Once

AI Learns to Lip-Sync From Audio Clips

Lip Reading AI More Accurate Than Humans

Related posts

Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads

How Centralized Radar Processing on NVIDIA DRIVE Enables Safer, Smarter Level 4 Autonomy

Designing Protein Binders Using the Generative Model Proteina-Complexa

How to Build Deep Agents for Enterprise Search with NVIDIA AI-Q and LangChain

Scaling Autonomous AI Agents and Workloads with NVIDIA DGX Spark