ICYMI: New AI Tools and Technologies Announced at NVIDIA GTC Keynote

At NVIDIA GTC this November, new software tools were announced that help developers build real-time speech applications, optimize inference for a variety of use-cases, optimize open-source interoperability for recommender systems, and more. Watch the keynote from CEO, Jensen Huang, to learn about the latest NVIDIA breakthroughs.

Announcing Riva Custom Voice and NVIDIA Riva Enterprise

Today, NVIDIA unveiled a new version of NVIDIA Riva with a Custom Voice feature. With Riva Custom Voice, enterprises can create a unique voice to represent their brand easily.

NVIDIA also announced Riva Enterprise, a paid program that includes NVIDIA expert support for enterprises that want to deploy Riva at large scale. Customers and partners with smaller workloads can continue to use Riva free of charge.

Riva highlights include:

Create a new neural voice with 30 mins of audio data in a day on A100.
Implement world-class Speech Recognition with support for five other languages.
Scale to hundreds and thousands of real-time streams.
Run in any cloud, on-premise, and at the edge.

Try Riva today from the NGC catalog and sign up for the NVIDIA Riva Enterprise interest list.

Learn more at this GTC Session

Conversational AI Demystified.

Announcing TensorRT 8.2 and New PyTorch and TensorFlow Integrations

Today, NVIDIA announced for production deployment TensorRT 8.2, the latest version of its high-performance deep learning inference optimizer and runtime engine. With new optimizations, inference applications can now run billion parameter language models in real-time and run inference in TensorFlow and PyTorch 3x faster with just one line of code.

Highlights include:

Optimizations for T5 and GPT-2 deliver real-time translation and summarization with 21x faster performance vs CPUs.
Integration of TensorRT with PyTorch and TensorFlow achieving 3x performance with just one line of code in frameworks.
Simple Python API for developers using Windows.

Download the TensorFlow-TensorRT integration.

Torch-TensorRT and TensorRT 8.2, both will be available in late November from the NGC catalog, and TensorRT page respectively.

The latest version of samples, parsers, and notebooks are always available in the TensorRT open source repo.

Learn more at these GTC Sessions

Announcing NVIDIA Triton Inference Server 2.15

Today, NVIDIA announced NVIDIA Triton Inference Server 2.15. NVIDIA Triton is an open-source inference-serving software that brings fast and scalable AI to production.

Highlights include:

Model Analyzer to determine optimal model execution parameters such as precision, batch size, number of concurrent model instances, and client requests for given latency, throughput, and memory constraints.
RAPIDS Forest Inference Library (FIL) backend to run inference on tree-based models such as Gradient Boosted Decision Trees, Random Forests.
Multi-GPU multinode distributed inference for giant Transformer-based language model support.
Triton is available in all major public clouds – Amazon SageMaker (new), Microsoft Azure, Google Cloud, Alibaba Cloud (new), and Tencent Cloud. Triton can be used in both managed AI platforms and Kubernetes services.
Triton now supports AI inference workloads on Arm CPUs (new), in addition to NVIDIA GPUs and x86 CPUs.

You can download NVIDIA Triton from the NGC catalog here, and obtain code and documentation on GitHub.

Learn more at this GTC Session

Maximize AI Inference Serving Performance with NVIDIA Triton Inference Server.

Announcing NVIDIA Merlin Extended Open Source Interoperability

Today, NVIDIA announced the latest release of NVIDIA Merlin. NVIDIA Merlin is an open-source framework for end-to-end development of recommender systems, from data preprocessing to model training and inference. NVIDIA continues to release features, libraries, and packages tailored to accelerate recommender workflows.

Highlights include:

Transformers4rec, a new library, wraps popular Hugging Face Transformer Architectures, and makes them accessible for building a session-based recommender pipeline. This helps predict a user’s next actions with little, or no, user data within a dynamic session.
SparseOperationsKit (SOK), a new open-source Python package, supports sparse training and inference with DL and is compatible with all common DL frameworks, including TensorFlow.
Most common DL frameworks do not support model-parallelism, which makes it challenging to use all available GPUs in a cluster. SOK being compatible with TensorFlow helps fill that void.

For more information about the latest release, download NVIDIA Merlin.

Learn more at these GTC Sessions

Announcing the NeMo Framework, Megatron 530B, and Triton Multi-GPU Multi-Node Inference

Today, NVIDIA announced the NeMo Framework, a new capability in NeMo for developing large-scale language models (LLM). The NeMo Framework is based on Megatron advancements that enables enterprises to train and scale language models with trillions of parameters.

Highlights include:

Automated data curation tasks such as formatting, deduplication, and blending.
Advanced parallelization techniques such as pipeline, tensor, and data parallelism.
Train a 20-billion-parameter model in less than a month.
Train Megatron 530B, the customizable LLM for new domains and languages.
Scale LLM to multiple GPUs and nodes for inference with NVIDIA Triton Inference Server.

Sign-up for early-access to download the NVIDIA NeMo Framework.

Learn more at this GTC Session

NVIDIA NeMo: Speech Recognition, Speech Synthesis, and NLP Updates.

Announcing DeepStream 6.0

Today, NVIDIA announced the latest release of DeepStream, a powerful AI streaming analytics toolkit for building high-performance video analytics applications and services. This new version introduces a low-code programming workflow, support for data formats and algorithms, and a range of new getting started resources.

Highlights include: