California-based startup BabbleLabs is working to enhance speech quality, accuracy, and personalization. The company recently announced a new deep learning product that relies on GPUs end-to-end to perform tasks such as speech enhancement, noise reduction, as well as audio and video processing from standard video or audio.
“Our first product, Clear Cloud, brings to market BabbleLabs’ new industry-leading AI compute techniques,” said Chris Rowen, CEO at BabbleLabs. “This is the first of many products in our roadmap that will help democratize speech enhancement technology for everyday applications used in real-world environments.”
Using NVIDIA Tesla V100 GPUs on Google Cloud, with the cuDNN-accelerated TensorFlow deep learning framework, the company trained their neural network on hundreds of thousands of hours of unique noisy speech.
For inference the company uses the same NVIDIA Tesla V100 GPUs on the cloud used during training. The neural network delivers impressive results, enabling the technology to be used on a comprehensive range of vocabulary, accents, and languages.
Original
Enhanced
“The sheer performance of GPUs, combined with their robust support in deep learning programming environments, allows us to train bigger, more complex networks with vastly more data and deploy them commercially at low cost,” Rowen said. “GPUs are a key element in BabbleLabs’ delivery of the world’s best speech enhancement technology.”
The company recently published a detailed blog that explains their use of GPUs and deep learning. The Clear cloud API for speech enhancement is available on this product page.
Read more >