Transfer Learning Toolkit

Speed up AI training by over 10x and create highly accurate and efficient domain-specific AI models.


Develop like a pro with zero coding.


Get Started


Creating an AI/ML model from scratch to solve a business problem is capital intensive and time consuming. Transfer learning is a popular technique that can be used to extract learned features from an existing neural network model to a new one. The NVIDIA Transfer Learning Toolkit (TLT) is the AI toolkit that abstracts away the AI/DL framework complexity and enables you to build production quality pre-trained models faster with no coding required.

A toolkit for anyone building AI apps and services, TLT helps reduce costs associated with large scale data collection, labeling, and eliminates the burden of training models ground up.

With TLT, you can use NVIDIA’s production quality pre-trained models and deploy as is or apply minimal fine-tuning for various computer vision and conversational AI use-cases.

embedded-diagram-tlt3.0-launch-software-stack-diagram-1571862-01.png

Easier & Faster Training

Add state of the art AI to your application with zero coding. No AI frameworks expertise needed

Highly Accurate AI

Remove barriers and unlock higher network accuracy by using purpose-built pre-trained models

Greater Throughput

Reduce deployment costs significantly and achieve high throughput inference



Optimized Pre-Trained Models For Computer Vision & Conversational AI

Avoid the time consuming process of creating and optimizing models from scratch or by using unoptimized open source models by focusing on your solution. TLT speeds up engineering efforts by over 10x using the NVIDIA production quality models to achieve high throughput and accuracy in a shorter duration of time. These AI models are free and readily available for download from NGC.

10x


Pre-trained Models for Common AI Tasks

Computer Vision Pre-Trained Models

You can jumpstart your AI project by using NVIDIA pretrained models already built for a variety of these industry use-cases, speeding up your Proof of Concept (PoC) to production process. The AI models can be readily used for common computer vision use-cases such as counting and detecting people in crowded spaces, detecting and classifying vehicles, license plate detection and recognition at a toll booth, parking management, heart rate monitoring for patients at a healthcare facility and more.

Facial Landmarks Estimation

PeopleNet

Gesture Recognition

License Plate Detection and Recognition

people detection

People Detection

Detect person, bags and face in crowded spaces such as transport hubs, improve customer experiences, analyze pedestrian foot traffic and more.

lpd recognition

License Plate Detection & Recognition

Detect and identifies vehicle license plates for various applications including parking enforcement, automated toll booth, traffic monitoring, etc.

vehicle detection

Vehicle Detection & Classification

Detect type of vehicle or make/model of cars for smart city applications

People Detection Models

PeopleNet

3 class object detection network to detect people in an image.

View on NGC

PeopleSegNet

1 class instance segmentation network to detect and segment instances of people in an image.

View on NGC

FaceDetect

Detect faces from an image.

View on NGC

FaceDetect-IR

1-class object detection network to detect faces in an IR image.

View on NGC

License Plate Detection & Recognition Models

LPDNet

Object Detection network to detect license plates in an image of a car.

View on NGC

LPRNet

Recognize characters from an image of a car license plate.

View on NGC

Vehicle Detection & Classification Models

TrafficCamNet

A four class object detection network to detect cars and other objects in an image.

View on NGC

DashCamNet

A four class object detection network to detect cars and other objects in an image. This network is targeted for detection objects from moving camera.

View on NGC

VehicleMakeNet

Classify cars into 1 of 20 popular car brands such as Acura, Audi, BMW, Chevrolet, Chrysler, Dodge, Ford, GMC, Honda, Hyundai, Infiniti, Jeep, Kia, Lexus, Mazda, Mercedes, Nissan, Subaru, Toyota, and Volkswagen.

View on NGC

VehicleTypeNet

Classify type of a vehicle into coupe, sedan, SUV, van, large vehicle and truck.

View on NGC
gaze estimation

Gaze Estimation

Estimates where a person is looking at with 3d line of sight.

facial landmark

Facial Landmark

Detect key landmarks on the face and track them for shape prediction, localizing the face in the image etc.

heart rate estimation

Heart Rate Estimation

Estimates heart rate using Computer vision for applications in healthcare and patient monitoring.

Gaze Estimation Models

Gaze Estimation

Detect a person's eye gaze point of regard and gaze vector.

View on NGC

Facial Landmark Models

Facial Landmarks Estimation

Detect fiducial keypoints from an image of a face.

View on NGC

Heart Rate Estimation Models

HeartRateNet

Estimate a person's heart-rate non-invasively from RGB facial videos.

View on NGC
human gesture

Human Gestures and Emotion

Computer vision tasks for detecting various hand gestures and emotion.

segmentation

Segmentation

Identify each instance of multiple objects in a frame at the pixel level.

text recognition

Text Recognition

Recognizes text from an image.

Human Gestures and Emotion Models

EmotionNet

Network to classify emotions from face.

View on NGC

GestureNet

Classify gestures from hand crop images.

View on NGC

Segmentation Models

Instance Segmentation - MaskRCNN

Produce bounding boxes around the object and segmentation masks

View on NGC

Semantic Segmentation - UNET

Perform image classification at pixel level. Assign every pixel in an image to a class label. Clubs all instances of a class to the same label.

View on NGC

PeopleSegNet

1 class instance segmentation network to detect and segment instances of people in an image.

View on NGC

Text Recognition Models

Text Recognition

Recognize characters from an image of a car license plate.

View on NGC
object detection

Object Detection

Detect one or multiple objects in a frame and place bounding boxes around the object.

image-classification

Image Classification

Easily classify images into designated classes based on the image features. Supported network architectures: ResNet, GoogLeNet, EfficientNet, VGG, DarkNet, MobileNet and CSPDarkNet.

Object Detection Models

DetectNet_v2

DetectNet_v2 is NVIDIA optimized object detection architecture to achieve high performance.

View on NGC

YOLOv3, YOLOv4, FasterRCNN, SSD/DSSD, RetinaNet

Open model architectures optimized for performance on NVIDIA GPUs.

View on NGC

Image Classification Models

Easily classify images into designated classes based on the image features. Supported network architectures: ResNet, GoogLeNet, EfficientNet, VGG, DarkNet, MobileNet and CSPDarkNet.

View on NGC

Achieve State-of-the-art Accuracy Using Model Architectures

With TLT, you have the full flexibility with either bringing your own data to fine-tune the model for a specific use-case using 100+ permutations of neural network architectures such as ResNet, VGG, FasterRCNN, RetinaNet, YOLOv3/v4 etc or use one of NVIDIA’s multi-purpose production quality models for common AI tasks instead of going through the hassle of training from scratch.

Image Classification
Object Detection
Segmentation
DetectNet_V2
FasterRCNN
SSD
YOLOV3
YOLOV4
RetinaNet
DSSD
MaskRCNN
UNET
ResNet
10/18/34/50/101

VGG16/19

GoogLeNet

MobileNet V1/V2

SqueezeNet

DarkNet 19/53

CSPDarkNet 19/53

EfficientNet

TLT adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for high throughput inference.



Deploy State-of-the-Art AI Models

Faster Inference Using Model Pruning & Quantization-Aware Training





Companies building AI solutions are in need of highly accurate AI models that can efficiently make predictions while achieving faster inference within tight memory constraints. Unpruned AI models, in many computer vision use-cases, are not optimized for low power devices. If you are solving a problem with a limited dataset, transfer learning along with select pruning improves channel density for high throughput inference.

Learn More


Typically AI models when executed in lower precision are more compute-efficient. INT8 precision AI models are significantly faster than running inference in floating point, quantizing FP32/16 weights to INT8 post-training can reduce model accuracy due to quantization errors in some cases. With the Quantization-Aware Training (QAT) feature of TLT, quantization of weights in the training step helps produce comparable accuracy as FP16/FP32 models versus post-training quantization. With QAT in TLT, developers can achieve upto 2X inference speedup using INT8 precision while maintaining accuracy comparable to FP16.

Learn More

Jetson Nano
Jetson Xavier NX
Jetson AGX Xavier
T4
A100
Model Architecture
Inference Resolution
Precision
Model Accuracy
GPU (FPS)*
GPU (FPS)
DLA1 (FPS)
DLA2 (FPS)
GPU (FPS)
DLA1 (FPS)
DLA2 (FPS)
GPU (FPS)
GPU (FPS)
PeopleNet-ResNet34
960x544x3
INT8
84%
11
182
58
58
314
75
75
1043
6001
TrafficCamNet
960x544x3
INT8
84%
19
264
105
105
478
140
140
1703
10054
License Plate Detection
640x480x3
INT8
98%
66
784
194
194
1370
256
256
5921
21931
Facial Landmark
80x80x1
FP16
6.1 pixel error
128
769
-
-
1462
-
-
4795
23117
GazeNet
224x224x1
224x224x1
224x224x1
25x25x1
FP16
6.5 RMSE
104
927
-
-
1654
-
-
5219
26534
GestureNet
160x160x3
FP16
0.85 F1 Score
96
993
-
-
1646
-
-
5660
34086

Unlock peak inference performance with NVIDIA pre-trained models across NVIDIA platforms- Jetson Nano, Xavier NX, AGX Xavier, T4 and Ampere A100 GPUs. For more details on batch size and other models, check the detailed performance datasheet.



Powerful End-to-End Vision AI Pipeline Using DeepStream SDK







Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using DeepStream SDK and Transfer Learning Toolkit. The production ready AI models produced by TLT can be easily integrated with NVIDIA DeepStream SDK and TensorRT for high throughput inference and enabling you to unlock greater performance for a variety of applications including smart cities and hospitals, industrial inspection, logistics, traffic monitoring, retail analytics etc.

Learn More

Unlock highest stream density and deploy at scale using DeepStream SDK

Conversational AI Pre-Trained Models

TLT for Conversational AI includes support for Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) use-cases. You can now easily design personalized real-time call center experiences, smart kiosks, high quality services for- intent recognition, entity recognition, sentiment analysis and more using readily available pre-trained models from NGC.

speech recognition

Speech Recognition (ASR)

Automatic speech recognition (ASR) takes human voice as input and converts it into readable text.

nlp

Natural Language Processing (NLP)

Natural language understanding (NLU) takes text as input, understands context and intent, and uses it to generate an intelligent response.

Speech Recognition (ASR) Models

Jasper

An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.

View on NGC

QuartzNet

An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.

View on NGC

Natural Language Processing (NLP) Models

BERT Text Classification

This model classifies documents into predefined categories.

View on NGC

BERT NER

Takes a piece of text as input and for each word in the text, the model identifies a category the word belongs to.

View on NGC

BERT Punctuation

Predicts a punctuation mark that should follow the word (if any) and predicts if the word should be capitalized or not.

View on NGC

BERT Intent and Slot

Classifies Intent and detects all relevant slots (Entities) for this Intent in a query.

View on NGC

Question Answering Bert Large

Bert Large Uncased model for extractive question answering on any provided content.

View on NGC

Question Answering Bert Base

Bert Uncased model for extractive question answering on any provided content.

View on NGC


Deploy State-of-the-Art Conversational AI Models

Powerful End-to-End AI Pipeline Using JARVIS




Jarvis is a fully accelerated application framework for developers building and deploying multimodal conversational AI services that uses state-of-the-art deep learning models in end-to-end deep learning pipelines. Developers at enterprises can easily fine-tune state-of-art-models on their data using Transfer Learning Toolkit to achieve a deeper understanding of their specific context. Using optimized pretrained models and transfer learning in Jarvis you can train and deploy applications using just 1/10th data versus manual non-transfer learning based approaches.

Learn More
TLT ConvAI Workflow Diagram 1920x800.jpg

Train and deploy end-to-end conversational AI pipeline using Pretrained Models, TLT and Jarvis






Testimonials


“INEX RoadView, our comprehensive automatic license plate recognition system for toll roads, uses NVIDIA’s end-to-end vision AI pipeline, production ready AI models, TLT, and DeepStream SDK. Our engineering team not only slashed the development time by 60% but they also reduced the camera hardware cost by 40% using Jetson Nano and Xavier NX. This enabled our vendors to deploy RoadView, the only out of the box ALPR solution, quickly and reliably. For us, nothing else came close.”


INEX

"KION Group is working on robust AI-based distribution autonomy solutions across its brands, to address operational needs and logistics optimization challenges and greatly reduce flow exception events. Innovation, engineering and digital transformation services are benefiting from optimized NVIDIA pre-trained models while rapidly innovating and fine-tuning models on the fly using Transfer Learning Toolkit and deploying with Nvidia Deepstream unlocking multi-stream density with Jetson platforms."


KION

"At Quantiphi, we use NVIDIA SDKs to build real-time video analytics workflows for many of our Fortune 500 customers across Retail and Media & Entertainment. Transfer Learning Toolkit provides an efficient way to customize training and model pruning for faster edge inference. DeepStream allows us to build high throughput inference pipelines on the Cloud and easily port them to the Jetson NX devices."


Quantiphi

"We are enabling developers and third-party vendors to readily build intelligent AI apps leveraging Optra’s skills marketplace. As a new entrant to the Edge AI market, being able to differentiate our offerings and time to market was crucial. Readily available MaskRCNN from TLT and easy integration into DeepStream saved 25% development effort right out of the box for our R&D team."


Lexmark Ventures

"Using NVIDIA’S TLT made training a real time car detector and license plate detector easy. It eliminated our need to build models from the ground up, resulting in faster development of models and ability to explore options."


Booz Allen Hamilton

"SmartCow is building turnkey AIoT solutions to optimize turnaround time at ports and dry docks. By using TLT, we were able to reduce the training iterations by 9x and reduce the data collection and labeling effort by 5x which significantly reduces our training cost by 2x"

SmartCow



General FAQ

Yes, TLT models are free for commercial use. For specific licensing terms, refer to model EULA.
TLT uses TensorFlow and Keras framework completely abstracted away from the user. Users operate TLT through documented spec files and do not have to learn about DL framework.
Pull the TLT container from NGC. The container comes pre-packaged with Jupyter notebooks and sample spec files for various network architectures. Additional technical resources can be found here.
No third party pre-trained models are supported by TLT. Only NVIDIA pre-trained models from NGC are currently supported.
Training with TLT is only on x86 with NVIDIA GPU such as a V100. Models trained with TLT can be deployed on any NVIDIA platform including Jetson.
To deploy trained models on DeepStream, refer to Deploying to DeepStream chapter of TLT Getting started guide.
The purpose-built models can be used as is out of box and can also be re-trained with your dataset. The architecture specific models for detection, segmentation, and classification are required to be re-trained with TLT.

Latest Product News

tutorial

Developer Tutorial

Learn how to train State-Of-The-Art Models for classification and object detection

Read Blog
dev tutorial

Developer Tutorial

Learn how to create a real-time number plate detection and recognition app.

Read Blog
gtc

Developer Webinar

Learn how to create a gesture recognition application with robot interactions.

Watch Now
community

Community Projects

Learn something new or build your own project. See projects built by our developer community.

Submit A Project