Technical Walkthrough 10

Text Normalization and Inverse Text Normalization with NVIDIA NeMo

Text normalization (TN) converts text from written form into its verbalized form, and it is an essential preprocessing step before text-to-speech (TTS). TN... 9 MIN READ
Technical Walkthrough 2

Dynamic Scale Weighting Through Multiscale Speaker Diarization

Speaker diarization is the process of segmenting audio recordings by speaker labels and aims to answer the question “Who spoke when?”. It makes a clear... 10 MIN READ
Technical Walkthrough 2

Improving Japanese Language ASR by Combining Convolutions with Attention Mechanisms

Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours... 5 MIN READ
Technical Walkthrough 4

Changing CTC Rules to Reduce Memory Consumption in Training and Decoding

Loss functions for training automatic speech recognition (ASR) models are not set in stone. The older rules of loss functions are not necessarily optimal.... 8 MIN READ
News 0

Attend Expert-Led Developer Sessions at GTC 2022

Register now and get ready to explore cutting-edge technology and the latest developer tools at GTC. < 1
News 0

New NVIDIA Neural Graphics SDKs Make Metaverse Content Creation Available to All

A dozen tools and programs—including new releases NeuralVDB and Kaolin Wisp—make 3D content creation easy and fast for millions of designers and creators. < 1