Pretrained AI Models
Accelerate AI development with production-quality models from the NGC catalog.
What Are Pretrained AI Models?
AI and machine learning models are built on mathematical algorithms and are trained using data and human expertise. These models help us accurately predict outcomes based on input data such as images, text, or language. But building, training, and optimizing production-quality models is expensive, requiring numerous iterations, domain expertise, and countless hours of computation.
Pretrained models have been trained on representative datasets and fine-tuned with weights and biases. These models can be easily retrained with custom data in a fraction of the time it takes to train from scratch.
Pretrained Models from the NGC Catalog
With production-ready, AI pretrained models from the NGC™ catalog, NVIDIA’s hub of GPU-optimized AI and high-performance computing (HPC) software, data scientists and developers can quickly adapt models or simply deploy them as is for inference.
Diverse Use Cases
NGC’s state-of-the-art, pretrained models and resources cover a wide set of use cases, from computer vision to natural language understanding to speech synthesis. These models leverage automatic mixed precision (AMP) on Tensor Cores and can scale from a single-node to multi-node systems to speed up training and inference.
The NVIDIA TAO Toolkit makes it easy to adapt and fine-tune the pretrained models with your custom data.
TAO Toolkit abstracts away the AI and deep learning framework complexity and enables you to build production-quality computer vision or conversational AI models in hours rather than months.
Transparent Model Resumes
Just like a resume provides a snapshot of a candidate's skills and experience, model credentials do the same for a model. Many pretrained models include critical parameters such as batch size, training epochs, and accuracy, providing you with the necessary transparency and confidence to pick the right model for your use case.
A Model for Every Use Case
Get started today with models that span across diverse use cases, including computer vision, speech, and language understanding.
With computer vision, devices can understand the world around us through images and videos. It uses image classification, object detection and tracking, object recognition, semantic segmentation, and instance segmentation.
License Plate Detection
LPDNet models detect one or more license plate objects from a car image and return a box around each object, along with an LPD label for each object.
PeopleNet models detect one or more physical objects from three categories within an image and return a box around each object, along with a category label for each object. The three categories of objects detected are persons, bags, and faces.
Residual network architecture introduced “skip connections.” The main advantage of these models is the usage of residual layers as a building block that helps with gradient propagation during training.
Natural language processing (NLP) uses algorithms and techniques to enable computers to understand, interpret, manipulate, and converse in human languages. It includes sentiment analysis, speech recognition, speech synthesis, language translation, and natural language generation.
BERT is a transformer-based pretrained language representation model that provides state-of-the-art results on a wide array of NLP tasks, including intent detection and named-entity recognition.
BioBERT checkpoints and scripts help achieve state-of-the-art results in biomedical text-mining benchmark tasks.
This model is based on the Transformer “Big” architecture originally presented in the "Attention Is All You Need" paper by Google. It includes pretrained models for multiple languages.
Speech deals with recognizing and translating audio into text or synthesizing speech from text. It includes speech synthesis, automatic speech recognition (ASR), and text-to-speech (TTS).
CitriNet is a Quartznet variant that utilizes efficient mechanisms such as subword encoding for highly accurate transcription and non-autoregressive connectionist temporal classification (CTC)-based decoding for efficient inference.
The QuartzNetmodel is an end-to-end neural acoustic model for ASR based on the Jasper model. It uses separable convolutions and larger filters, making it smaller than Jasper while maintaining comparable accuracy.
The Kaldi Speech Recognition Toolkit project began in 2009 at Johns Hopkins University and is now the de-facto speech recognition toolkit in the community, enabling speech services for millions of people every day.
FastPitch and HiFiGAN
The Fastpitch model produces a mel spectrogram from raw text, whereas HiFiGAN can generate audio from a mel spectrogram. These models can be combined and trained as an end-to-end pipeline for generating audio from text.
and HiFiGAN Models
Adapt Models Faster with NVIDIA TAO
NVIDIA Train, Adapt, and Optimize (TAO) is an AI-model-adaptation platform that simplifies and accelerates the creation of enterprise AI applications and services. By fine-tuning pretrained models with custom data through a UI-based, guided workflow, enterprises can produce highly accurate computer vision, speech, and language understanding models in hours rather than months, eliminating the need for large training runs and deep AI expertise.Learn More
NGC Catalog Resources
Accelerate your AI development with pretrained models from the NGC catalog.