GTC Silicon Valley-2019: Extreme Neural Network Computing Transforms Speech Quality

Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9247:Extreme Neural Network Computing Transforms Speech Quality

Chris Rowen(BabbleLabs)
We'll explore in depth the application of deep learning to advanced speech processing. We'll show how novel neural network architectures and training methods, combined with audio signal processing, deliver near-perfect separation of speech from background sounds, even in the face of heavy reverberation and non-stationary noise. Our talk highlights the unprecedented data gathering, augmentation methods, and parallel training compute that allow us to leverage thousands of hours of unique speech content. We'll delve into the software development, API, GPU-Based cloud deployment, and embedded library methods that work with the neural network to enable next-generation audio and video production, media streaming, and telephony systems. We'll also discuss the likely trajectory of deep learning technology for speech enhancement, speech recognition, speaker identification, and seamless human-machine interface.

View the slides (pdf)