Transfer Learning Toolkit
Speed up AI training by over 10x and create highly accurate and efficient domain-specific AI models.
Develop like a pro with zero coding.
Creating an AI/ML model from scratch to solve a business problem is capital intensive and time consuming. Transfer learning is a popular technique that can be used to extract learned features from an existing neural network model to a new one. The NVIDIA Transfer Learning Toolkit (TLT) is the AI toolkit that abstracts away the AI/DL framework complexity and enables you to build production quality pre-trained models faster with no coding required.
A toolkit for anyone building AI apps and services, TLT helps reduce costs associated with large scale data collection, labeling, and eliminates the burden of training models ground up.
With TLT, you can use NVIDIA’s production quality pre-trained models and deploy as is or apply minimal fine-tuning for various computer vision and conversational AI use-cases.

Easier & Faster Training
Add state of the art AI to your application with zero coding. No AI frameworks expertise needed
Highly Accurate AI
Remove barriers and unlock higher network accuracy by using purpose-built pre-trained models
Greater Throughput
Reduce deployment costs significantly and achieve high throughput inference
Optimized Pre-Trained Models For Computer Vision & Conversational AI
Avoid the time consuming process of creating and optimizing models from scratch or by using unoptimized open source models by focusing on your solution. TLT speeds up engineering efforts by over 10x using the NVIDIA production quality models to achieve high throughput and accuracy in a shorter duration of time. These AI models are free and readily available for download from NGC.

Pre-trained Models for Common AI Tasks
Computer Vision Pre-Trained Models
You can jumpstart your AI project by using NVIDIA pretrained models already built for a variety of these industry use-cases, speeding up your Proof of Concept (PoC) to production process. The AI models can be readily used for common computer vision use-cases such as counting and detecting people in crowded spaces, detecting and classifying vehicles, license plate detection and recognition at a toll booth, parking management, heart rate monitoring for patients at a healthcare facility and more.
People Detection
Detect person, bags and face in crowded spaces such as transport hubs, improve customer experiences, analyze pedestrian foot traffic and more.
People Detection Models
PeopleSegNet
1 class instance segmentation network to detect and segment instances of people in an image.
View on NGCLicense Plate Detection & Recognition
Detect and identifies vehicle license plates for various applications including parking enforcement, automated toll booth, traffic monitoring, etc.
License Plate Detection & Recognition Models
Vehicle Detection & Classification
Detect type of vehicle or make/model of cars for smart city applications
Vehicle Detection & Classification Models
TrafficCamNet
A four class object detection network to detect cars and other objects in an image.
View on NGCDashCamNet
A four class object detection network to detect cars and other objects in an image. This network is targeted for detection objects from moving camera.
View on NGCVehicleMakeNet
Classify cars into 1 of 20 popular car brands such as Acura, Audi, BMW, Chevrolet, Chrysler, Dodge, Ford, GMC, Honda, Hyundai, Infiniti, Jeep, Kia, Lexus, Mazda, Mercedes, Nissan, Subaru, Toyota, and Volkswagen.
View on NGCVehicleTypeNet
Classify type of a vehicle into coupe, sedan, SUV, van, large vehicle and truck.
View on NGCPeople Detection Models
PeopleSegNet
1 class instance segmentation network to detect and segment instances of people in an image.
View on NGCLicense Plate Detection & Recognition Models
Vehicle Detection & Classification Models
TrafficCamNet
A four class object detection network to detect cars and other objects in an image.
View on NGCDashCamNet
A four class object detection network to detect cars and other objects in an image. This network is targeted for detection objects from moving camera.
View on NGCVehicleMakeNet
Classify cars into 1 of 20 popular car brands such as Acura, Audi, BMW, Chevrolet, Chrysler, Dodge, Ford, GMC, Honda, Hyundai, Infiniti, Jeep, Kia, Lexus, Mazda, Mercedes, Nissan, Subaru, Toyota, and Volkswagen.
View on NGCVehicleTypeNet
Classify type of a vehicle into coupe, sedan, SUV, van, large vehicle and truck.
View on NGCGaze Estimation
Estimates where a person is looking at with 3d line of sight.
Gaze Estimation Models
Facial Landmark
Detect key landmarks on the face and track them for shape prediction, localizing the face in the image etc.
Facial Landmark Models
Heart Rate Estimation
Estimates heart rate using Computer vision for applications in healthcare and patient monitoring.
Heart Rate Estimation Models
Gaze Estimation Models
Facial Landmark Models
Heart Rate Estimation Models
Human Gestures and Emotion
Computer vision tasks for detecting various hand gestures and emotion.
Human Gestures and Emotion Models
Segmentation
Identify each instance of multiple objects in a frame at the pixel level.
Segmentation Models
Instance Segmentation - MaskRCNN
Produce bounding boxes around the object and segmentation masks.
View on NGCSemantic Segmentation - UNET
Perform image classification at pixel level. Assign every pixel in an image to a class label. Clubs all instances of a class to the same label.
View on NGCPeopleSegNet
1 class instance segmentation network to detect and segment instances of people in an image.
View on NGCText Recognition
Recognizes text from an image.
Text Recognition Models
Human Gestures and Emotion Models
Segmentation Models
Instance Segmentation - MaskRCNN
Produce bounding boxes around the object and segmentation masks
View on NGCSemantic Segmentation - UNET
Perform image classification at pixel level. Assign every pixel in an image to a class label. Clubs all instances of a class to the same label.
View on NGCPeopleSegNet
1 class instance segmentation network to detect and segment instances of people in an image.
View on NGCText Recognition Models
Object Detection
Detect one or multiple objects in a frame and place bounding boxes around the object.
Object Detection Models
DetectNet_v2
DetectNet_v2 is NVIDIA optimized object detection architecture to achieve high performance.
View on NGCYOLOv3, YOLOv4, FasterRCNN, SSD/DSSD, RetinaNet
Open model architectures optimized for performance on NVIDIA GPUs.
View on NGCImage Classification
Easily classify images into designated classes based on the image features. Supported network architectures: ResNet, GoogLeNet, EfficientNet, VGG, DarkNet, MobileNet and CSPDarkNet.
Image Classification Models
Object Detection Models
DetectNet_v2
DetectNet_v2 is NVIDIA optimized object detection architecture to achieve high performance.
View on NGCYOLOv3, YOLOv4, FasterRCNN, SSD/DSSD, RetinaNet
Open model architectures optimized for performance on NVIDIA GPUs.
View on NGCImage Classification Models
Easily classify images into designated classes based on the image features. Supported network architectures: ResNet, GoogLeNet, EfficientNet, VGG, DarkNet, MobileNet and CSPDarkNet.
View on NGCAchieve State-of-the-art Accuracy Using Model Architectures
With TLT, you have the full flexibility with either bringing your own data to fine-tune the model for a specific use-case using 100+ permutations of neural network architectures such as ResNet, VGG, FasterRCNN, RetinaNet, YOLOv3/v4 etc or use one of NVIDIA’s multi-purpose production quality models for common AI tasks instead of going through the hassle of training from scratch.
|
|
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
||
10/18/34/50/101 |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
|
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
|
✔ |
|
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
||
|
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
||
|
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
|||
|
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
||
|
✔ |
|
|
|
|
✔ |
|
|||
|
✔ |
|
✔ |
✔ |
|
|
✔ |
✔ |
TLT adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for high throughput inference.
Deploy State-of-the-Art AI Models
Faster Inference Using Model Pruning & Quantization-Aware Training
Companies building AI solutions are in need of highly accurate AI models that can efficiently make predictions while achieving faster inference within tight memory constraints. Unpruned AI models, in many computer vision use-cases, are not optimized for low power devices. If you are solving a problem with a limited dataset, transfer learning along with select pruning improves channel density for high throughput inference.
Learn More

Typically AI models when executed in lower precision are more compute-efficient. INT8 precision AI models are significantly faster than running inference in floating point, quantizing FP32/16 weights to INT8 post-training can reduce model accuracy due to quantization errors in some cases. With the Quantization-Aware Training (QAT) feature of TLT, quantization of weights in the training step helps produce comparable accuracy as FP16/FP32 models versus post-training quantization. With QAT in TLT, developers can achieve upto 2X inference speedup using INT8 precision while maintaining accuracy comparable to FP16.
Learn More
|
|
|
|
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
224x224x1 224x224x1 25x25x1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Unlock peak inference performance with NVIDIA pre-trained models across NVIDIA platforms- Jetson Nano, Xavier NX, AGX Xavier, T4 and Ampere A100 GPUs. For more details on batch size and other models, check the detailed performance datasheet.
Powerful End-to-End Vision AI Pipeline Using DeepStream SDK
Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using DeepStream SDK and Transfer Learning Toolkit. The production ready AI models produced by TLT can be easily integrated with NVIDIA DeepStream SDK and TensorRT for high throughput inference and enabling you to unlock greater performance for a variety of applications including smart cities and hospitals, industrial inspection, logistics, traffic monitoring, retail analytics etc.
Learn More
Unlock highest stream density and deploy at scale using DeepStream SDK
Conversational AI Pre-Trained Models
TLT for Conversational AI includes support for Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) use-cases. You can now easily design personalized real-time call center experiences, smart kiosks, high quality services for- intent recognition, entity recognition, sentiment analysis and more using readily available pre-trained models from NGC.
Speech Recognition (ASR)
Automatic speech recognition (ASR) takes human voice as input and converts it into readable text.
Speech Recognition (ASR)
Jasper
An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.
View on NGCQuartzNet
An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.
View on NGCNatural Language Processing (NLP)
Natural language understanding (NLU) takes text as input, understands context and intent, and uses it to generate an intelligent response.
Natural Language Processing (NLP) Models
BERT NER
Takes a piece of text as input and for each word in the text, the model identifies a category the word belongs to.
View on NGCBERT Punctuation
Predicts a punctuation mark that should follow the word (if any) and predicts if the word should be capitalized or not.
View on NGCBERT Intent and Slot
Classifies Intent and detects all relevant slots (Entities) for this Intent in a query.
View on NGCQuestion Answering Bert Large
Bert Large Uncased model for extractive question answering on any provided content.
View on NGCQuestion Answering Bert Base
Bert Uncased model for extractive question answering on any provided content.
View on NGCSpeech Recognition (ASR) Models
Jasper
An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.
View on NGCQuartzNet
An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.
View on NGCNatural Language Processing (NLP) Models
BERT NER
Takes a piece of text as input and for each word in the text, the model identifies a category the word belongs to.
View on NGCBERT Punctuation
Predicts a punctuation mark that should follow the word (if any) and predicts if the word should be capitalized or not.
View on NGCBERT Intent and Slot
Classifies Intent and detects all relevant slots (Entities) for this Intent in a query.
View on NGCQuestion Answering Bert Large
Bert Large Uncased model for extractive question answering on any provided content.
View on NGCQuestion Answering Bert Base
Bert Uncased model for extractive question answering on any provided content.
View on NGCDeploy State-of-the-Art Conversational AI Models
Powerful End-to-End AI Pipeline Using JARVIS
Jarvis is a fully accelerated application framework for developers building and deploying multimodal conversational AI services that uses state-of-the-art deep learning models in end-to-end deep learning pipelines. Developers at enterprises can easily fine-tune state-of-art-models on their data using Transfer Learning Toolkit to achieve a deeper understanding of their specific context. Using optimized pretrained models and transfer learning in Jarvis you can train and deploy applications using just 1/10th data versus manual non-transfer learning based approaches.
Learn More
Train and deploy end-to-end conversational AI pipeline using Pretrained Models, TLT and Jarvis
Testimonials
“INEX RoadView, our comprehensive automatic license plate recognition system for toll roads, uses NVIDIA’s end-to-end vision AI pipeline, production ready AI models, TLT, and DeepStream SDK. Our engineering team not only slashed the development time by 60% but they also reduced the camera hardware cost by 40% using Jetson Nano and Xavier NX. This enabled our vendors to deploy RoadView, the only out of the box ALPR solution, quickly and reliably. For us, nothing else came close.”
INEX
![]()
"KION Group is working on robust AI-based distribution autonomy solutions across its brands, to address operational needs and logistics optimization challenges and greatly reduce flow exception events. Innovation, engineering and digital transformation services are benefiting from optimized NVIDIA pre-trained models while rapidly innovating and fine-tuning models on the fly using Transfer Learning Toolkit and deploying with Nvidia Deepstream unlocking multi-stream density with Jetson platforms."
KION
![]()
"At Quantiphi, we use NVIDIA SDKs to build real-time video analytics workflows for many of our Fortune 500 customers across Retail and Media & Entertainment. Transfer Learning Toolkit provides an efficient way to customize training and model pruning for faster edge inference. DeepStream allows us to build high throughput inference pipelines on the Cloud and easily port them to the Jetson NX devices."
Quantiphi
![]()
"We are enabling developers and third-party vendors to readily build intelligent AI apps leveraging Optra’s skills marketplace. As a new entrant to the Edge AI market, being able to differentiate our offerings and time to market was crucial. Readily available MaskRCNN from TLT and easy integration into DeepStream saved 25% development effort right out of the box for our R&D team."
Lexmark Ventures
![]()
"Using NVIDIA’S TLT made training a real time car detector and license plate detector easy. It eliminated our need to build models from the ground up, resulting in faster development of models and ability to explore options."
Booz Allen Hamilton
![]()
"SmartCow is building turnkey AIoT solutions to optimize turnaround time at ports and dry docks. By using TLT, we were able to reduce the training iterations by 9x and reduce the data collection and labeling effort by 5x which significantly reduces our training cost by 2x"
SmartCow
![]()
General FAQ
Latest Product News

Developer Tutorial
Learn how to train State-Of-The-Art Models for classification and object detection

Developer Tutorial
Learn how to create a real-time number plate detection and recognition app.

Developer Webinar
Learn how to create a gesture recognition application with robot interactions.

Community Projects
Learn something new or build your own project. See projects built by our developer community.