Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9776:Scaled Speech and Language Technology in the Contact Center

Wonkyum Lee(Gridspace),Anthony Scodary(Gridspace)
We'll describe how large data scale (over two millennia of speech data per year) and low-latency requirements have enabled and required novel approaches to several speech and language models. Our talk will cover the GPU speech recognition training pipeline, continuous feedback-based training, optimizations for training, and inference on TensorRT for ultra- low latency text-to-speech models for call centers. We will discuss accuracy and latency benchmarks for speech recognition on conversational speech, speech synthesis, data-driven dialogue systems, emotion recognition, and speech act classification. We'll also demonstrate our system running on a scaled simulated call center and show live speech recognition, synthesis, and language processing.

View the slides (pdf)