Microsoft reached a new milestone in the development of more accurate speech recognition.
Using a cluster of Tesla M40 GPUs and the cuDNN version of Computational Network Toolkit (CNTK), their latest version of the technology achieved the lowest word error rate (WER) in the industry.
“Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set,” said the researchers in their recent research paper. “We believe this is the best performance reported to date for a recognition system not based on system combination. An ensemble of acoustic models advances the state of the art to 6.3% on the Switchboard test data.”
These advances will directly benefit the future of digital assistants, like Cortana and their real-time Skype Translator service. Microsoft said “the speech research is significant to Microsoft’s overall artificial intelligence strategy of providing systems that can anticipate users’ needs instead of responding to their commands, and to the company’s overall ambitions for providing intelligent systems that can see, hear, speak and even understand, augmenting how humans work today.”’
Read more >
Microsoft’s Voice Recognition Technology Almost as Accurate as Humans
Sep 15, 2016
Discuss (0)
Related resources
- GTC session: Human-Like AI Voices: Exploring the Evolution of Voice Technology
- GTC session: Live from GTC: A Conversation with Microsoft
- GTC session: Adapting Conformer-Based ASR Models for Conversations Over the Phone
- NGC Containers: ASR Parakeet CTC Riva 1.1b
- NGC Containers: NVIDIA Retrieval QA E5 Embedding v5
- Webinar: How Telcos Transform Customer Experiences with Conversational AI