Speech & Audio Processing

Jun 09, 2026

Evaluate Clinical ASR Models Faster with Agent Skills and NVIDIA Nemotron Speech

Training a speech AI model to correctly recognize or synthesize clinical terminology is surprisingly difficult. Drug names like Acetaminophen, Amlodipine,...

13 MIN READ

Oct 22, 2024

Multi-Agent AI and GPU-Powered Innovation in Sound-to-Text Technology

The Automated Audio Captioning task centers around generating natural language descriptions from audio inputs. Given the distinct modalities between the input...

7 MIN READ

Image of two people sitting in their cubicles with speech recognition visualizations in the background.

Aug 05, 2024

Developing Robust Georgian Automatic Speech Recognition with FastConformer Hybrid Transducer CTC BPE

Building an effective automatic speech recognition (ASR) model for underrepresented languages presents unique challenges due to limited data resources. ...

9 MIN READ

Decorative image of text and speech recognition processes encircling the globe.

Apr 18, 2024

New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model

NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises. The NeMo team just...

4 MIN READ

Apr 18, 2024

Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released...

6 MIN READ

Apr 18, 2024

Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the Parakeet...

6 MIN READ

Mar 19, 2024

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition...

8 MIN READ

Person sitting at a desk having a conversation with a speech ai chatbot.

Jan 16, 2024

New Support for Dutch and Persian Released by NVIDIA NeMo ASR

Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian—languages often overlooked in the AI...

2 MIN READ

Decorative image of groups of people using speech AI in different ways standing around a globe.

Nov 07, 2023

Video: Exploring Speech AI from Research to Practical Production Applications

The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented...

2 MIN READ

Sep 20, 2023

Workshop: Building Conversational AI Applications

Learn how to build and deploy production-quality conversational AI apps with real-time transcription and NLP.

1 MIN READ

An image representing fast diffusion TTS.

Sep 01, 2023

Speeding Up Text-To-Speech Diffusion Models by Distillation

Every year, as part of their coursework, students from the University of Warsaw, Poland get to work under the supervision of engineers from the NVIDIA Warsaw...

7 MIN READ

Aug 10, 2023

Mastering LLM Techniques: Customization

Large language models (LLMs) are becoming an integral tool for businesses to improve their operations, customer interactions, and decision-making processes....

12 MIN READ

Jul 10, 2023

Adapting LLMs to Downstream Tasks Using Federated Learning on Distributed Datasets

Large language models (LLMs), such as GPT, have emerged as revolutionary tools in natural language processing (NLP) due to their ability to understand and...

7 MIN READ

Jun 14, 2023

Boost Your AI Workflows with Federated Learning Enabled by NVIDIA FLARE

One of the main challenges for businesses leveraging AI in their workflows is managing the infrastructure needed to support large-scale training and deployment...

7 MIN READ

Jun 14, 2023

How to Get Better Outputs from Your Large Language Model

Large language models (LLMs) have generated excitement worldwide due to their ability to understand and process human language at a scale that is...

13 MIN READ

Jun 06, 2023

Unlocking Speech AI Technology for Global Language Users: Top Q&As

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common...

10 MIN READ