Developer Blog

Researchers from around the world working on speech applications are gathering this month for INTERSPEECH, a conference focused on the latest research and technologies in speech processing. NVIDIA researchers will present papers on groundbreaking research in speech recognition and speech synthesis.

Conversational AI research is fueling innovations in speech processing that help computers communicate more like humans and add value to organizations.

Accepted papers from NVIDIA at this year’s INTERSPEECH features the newest speech technology advancements, from free fully formatted speech datasets to new model architectures that deliver state-of-the-art performance.

Here are a couple featured projects:

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction 
Authors: Stanislav Beliaev, Boris Ginsburg
From the abstract: This model has only 13.2M parameters, almost 2x less than the present state-of-the-art text-to-speech models. The non-autoregressive architecture allows for fast training and inference. The small model size and fast inference make the TalkNet an attractive candidate for embedded speech synthesis.

This talk will be live on Thursday, September 2, 2021 at 4:45 pm CET, 7:45 am PST

Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices
Authors: Gonçalo Mordido, Matthijs Van Keirsbilck, Alexander Keller
From the abstract: For command recognition on Google Speech Commands v1, we improve the state-of-the-art accuracy from 97.21% to 97.41% at the same network size. For speech recognition on Librispeech, we half the number of weights to be trained while only sacrificing about 1% of the floating-point baseline’s word error rate.

This talk will be live on Friday, September 3, 2021 at 4 pm CET, 7 am PST

View the full schedule of NVIDIA activities >>>