At GTC 2022, NVIDIA announced major updates to its suite of NVIDIA AI software, for developers to build real-time speech AI applications, create high-performing recommenders at scale and optimize inference in every application, and more. Watch the keynote from CEO, Jensen Huang, to learn about the latest advancements from NVIDIA.
Announcing NVIDIA Riva 2.0
Today, NVIDIA announced Riva 2.0 in general availability. Riva is an accelerated speech AI SDK which provides models, tools, and fully-optimized speech recognition and text-to-speech pipelines for real-time applications.
Highlights include:
- World class automatic speech recognition in seven languages.
- Neural-based text to speech, generating high-quality human-like voices.
- Domain-specific customization with TAO Toolkit and NeMo.
- Support to run in cloud, on-prem, and on embedded platforms.
NVIDIA also announced Riva Enterprise, providing enterprises with large-scale deployments access to speech experts at NVIDIA. Enterprises can try Riva with guided labs on ready to run infrastructure in LaunchPad.
Add this GTC session to your calendar to learn more:
Announcing NVIDIA Merlin 1.0 Hyperscale ML, DL Recommender Systems on CPU, GPU
Today, NVIDIA announced NVIDIA Merlin 1.0, an end-to-end framework designed to accelerate recommender workflows, from data preprocessing, feature transforms, training, optimization, and deployment. With this latest release of NVIDIA Merlin, data scientists and machine learning engineers can scale faster with less code. The new capabilities offer quick iteration over features, models, as well as deployment of fully trained recommender pipelines with feature transforms, retrieval, and ranking models as an inference microservice.
Highlights include:
- Merlin Models, a new library for data scientists to train and deploy recommender models in less than 50 lines of code.
- Merlin Systems, a new library, for machine learning engineers to easily deploy recommender pipelines as an ensembled Triton microservice.
- Support for large scale multi-GPU, multinode inference, and less compute intensive workloads.
For more information about the latest release, download and try NVIDIA Merlin.
Add these GTC sessions to your calendar to learn more:
- Building and Deploying Recommender Systems Quickly and Easily with NVIDIA Merlin (NVIDIA)
- Scaling Real-time Deep Learning Recommendation Inference at a 150M+ User Scale (ShareChat)
- Large-scale Recommendation System on GPU at Life-Service Scenario (Meituan)
- Building Recommender Systems More Easily using Merlin Models (NVIDIA)
Announcing new features in NVIDIA Triton
Today, NVIDIA announced new key updates to NVIDIA Triton. Triton is an open-source inference-serving software that brings fast and scalable AI to every application in production.
Highlights include:
- Triton FIL backend: Model explainability with Shapley values and CPU optimizations for better performance.
- Triton Management Service to simplify and automate setting up and managing a fleet of Triton instances on Kubernetes. Alpha release is targeted for the end of March.
- Triton Model Navigator to automate preparing a trained model for production deployment with Triton.
- Fleet Command integration for edge deployment.
- Support for inference on AWS Inferentia and MLFlow plug-in to deploy MLFlow models.
- Kick-start your Triton journey with immediate, short-term access in NVIDIA LaunchPad without needing to set up your own Triton environment.
You can download Triton from the NGC catalog, and access code and documentation on GitHub.
Add these GTC sessions to your calendar to learn more:
- Fast, Scalable, and Standardized AI Inference Deployment for Multiple Frameworks, Diverse Models on CPUs and GPUs with Open-source NVIDIA Triton
- Optimal AzureML Triton Model Deployment using the Model Analyzer
Announcing new updates to the NVIDIA NeMo framework
Today NVIDIA announced the latest version of the NVIDIA NeMo framework, a framework for training large language models (LLM.) With the NeMo framework research institutions and enterprises can achieve the fastest training for any LLM. It also includes the latest parallelism techniques, data preprocessing scripts, and recipes to ensure training convergence.
Highlights include:
- Hyper parameter tuning tool that automatically creates recipes based on customers’ needs and infrastructure limitations.
- Reference recipes for T5 and mT5 models.
- Cloud support for Azure.
- Distributed data preprocessing scripts to shorten end-to-end training time.
Click here to apply for early access.
Add these GTC sessions to your calendar to learn more:
- Building Large-scale, Localized Language Models: From Data Preparation to Training and Deployment to Production
- How to Avoid the Staggering Cost of Training State-of-the-art Large Language Models
- Connect with the Experts: Conversational AI: Build Applications with Riva and Train Multibillion Parameter Language Models using Nemo-Megatron
Announcing new Features in NVIDIA Maxine
Today NVIDIA announced the latest version of NVIDIA Maxine, a suite of GPU-accelerated SDKs that reinvent audio and video communications with AI, elevating standard microphones and cameras for clear online communications. Maxine provides state-of-the-art real-time AI audio, video, and augmented reality features that can be built into customizable, end to end deep learning pipelines.
Highlights include:
- Audio super resolution: Improves real-time audio quality by upsampling the audio input stream from 8kHz to 16kHz and from 16kHz to 48kHz sampling rate.
- Acoustic echo cancellation: Cancels real-time acoustic device echo from input audio stream, eliminating mismatched acoustic pairs and double-talk. With AI-based technology, more effective cancellation is achieved than with traditional digital signal processing.
- Noise removal: Removes several common background noises using state-of-the-art AI models while preserving the speaker’s natural voice.
- Room echo cancellation: Removes reverberations from audio using state-of-the-art AI models, restoring clarity of a speaker’s voice.
Add these GTC sessions to your calendar to learn more:
- How Zoom Video uses Distributed Model Training in Kubernetes Cluster
- How Avaya’s Real-time Cloud Media Processing Core Maintains Ultra-low Latency with Maxine
- Project Starline: A High-fidelity Telepresence System
- Put Your Body into It! Easy Talent Tracking in Virtual Environments
Register for GTC now to learn more about the latest updates to GPU-accelerated AI technologies.