GTC Silicon Valley-2019: Revolutionary Voice Enhancement in Real-Time Communications with GPU

Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9222:Revolutionary Voice Enhancement in Real-Time Communications with GPU

Davit Baghdasaryan(2Hz, Inc),Arto Minasyan(2Hz, Inc)
We'll examine latency and performance challenges involved in deploying deep learning technologies that improve voice quality in real-time communications. We'll explain how deep learning changes traditional voice enhancement (e.g. noise cancellation), and cover our work using deep learning to eliminate the need for multiple microphones, which enforce a form factor such as a phone or headset. We'll show how moving those processes to software offers the flexibility to deploy the technology on headsets, mobile, laptops, and in the network. We will describe how we power and scale our DL-based algorithm on GPUs, which scale up to 100 times better than CPUs for server-side processing. We'll also discuss how we used CUDA and TensorRT to fit within the constraint of 12ms latency on an end-to-end real-time call.

View the slides (pdf)