Next version of Transfer Learning Toolkit with support for conversational AI models will be available in early 2021. Sign up to be notified on general availability.

NVIDIA Transfer Learning Toolkit

Speedup AI training and create highly accurate and efficient domain-specific AI models. Develop like a pro with zero coding.

Get Started


Key challenges encountered by developers and companies with limited data and deep learning framework expertise are how to train and optimize their models to unlock the best inference performance. Building an AI/ML model from scratch to solve a business problem is capital intensive and time consuming. Transfer learning is a popular technique that extracts learned features from an existing neural network to a new one enabling you to start with a pretrained model that can be retrained to fit your specific use-case rather than having to start from zero.

NVIDIA Transfer Learning Toolkit (TLT) offers the hassle free zero coding approach to produce optimized pre-trained AI models with the flexibility of customizing them with your own data. Product builders, developers and software partners building AI apps and services can use TLT for computer vision and conversational AI use-cases.

With TLT, you have full flexibility with either bringing your own data to fine-tune the model for a specific use-case using 100+ permutations of neural network architectures such as ResNet, VGG, FasterRCNN, RetinaNet, YOLOv3/v4 etc or use one of NVIDIA’s multi-purpose production quality models for common AI tasks instead of going through the hassle of training from scratch.

TLT’s simple Command Line Interface (CLI) abstracts away the AI framework complexity and enables you to build production quality pre-trained models faster with no coding required.


Easier & Faster Training

Add state of the art AI to your application with zero coding. No AI frameworks expertise needed

Highly Accurate AI

Remove barriers and unlock higher network accuracy by using purpose-built pre-trained models

Greater Throughput

Reduce deployment costs significantly and achieve high throughput inference for vision AI



Optimized Pre-Trained Models For Computer Vision & Conversational AI

AI task-based pre-trained models accelerate the AI training process and help reduce costs associated with large scale data collection, labeling, and training models from scratch. NVIDIA’s multi-purpose pre-trained models are production quality and can be used for various computer vision and conversational AI use-cases.

Avoid the time consuming process of creating and optimizing models from scratch or by using unoptimized open source models by focusing on your solution. TLT speeds up engineering efforts by over 10x using the NVIDIA production quality models to achieve high throughput and accuracy in a shorter duration of time. Unlock highest stream density and deploy at scale using DeepStream SDK for real-time vision and audio analytics.

Conversational AI applications range from Automatic Speech Recognition (ASR) use-cases such as designing personalized call center experience, smart kiosks or Natural Language Processing (NLP) for high quality services including intent recognition, entity recognition, sentiment analysis and more,you can speed up training and deployment of optimized pretrained models using Jarvis with using 1/10th data of what you’d typically gather otherwise with a manual non-transfer learning based approach.


Supported Models For Every Task

Computer Vision Pre-Trained Models

Computer vision use-cases such as counting and detecting people in crowded spaces, detecting and classifying vehicles, license plate detection and recognition at a toll booth, parking management, heart rate monitoring for patients at a healthcare facility, and more are widely applicable across many industries such as smart cities, retail, logistics, manufacturing, and more. You can jumpstart your AI project by using NVIDIA pretrained models already built for a variety of these industry use-cases, speeding up your Proof of Concept (PoC) to production process.

people detection

People Detection

Detect person, bags and face in crowded spaces such as transport hubs, improve customer experiences, analyze pedestrian foot traffic and more.

lpd recognition

License Plate Detection & Recognition

Detect and identifies vehicle license plates for various applications including parking enforcement, automated toll booth, traffic monitoring, etc.

vehicle detection

Vehicle Detection & Classification

Detect type of vehicle or make/model of cars for smart city applications

gaze estimation

Gaze Estimation

Estimates where a person is looking at with 3d line of sight.

facial landmark

Facial Landmark

Detect key landmarks on the face and track them for shape prediction, localizing the face in the image etc.

heart rate estimation

Heart Rate Estimation

Estimates heart rate using Computer vision for applications in healthcare and patient monitoring.

human gesture

Human Gestures and Emotion

Computer vision tasks for detecting various hand gestures and emotion.

segmentation

Segmentation

Identify each instance of multiple objects in a frame at the pixel level.

text recognition

Text Recognition

Recognizes text from an image.

People Detection Models

PeopleNet

3 class object detection network to detect people in an image.

View on NGC

PeopleSegNet

1 class instance segmentation network to detect and segment instances of people in an image.

View on NGC

FaceDetect

Detect faces from an image.

View on NGC

FaceDetect-IR

1-class object detection network to detect faces in an IR image.

View on NGC

License Plate Detection & Recognition Models

LPDNet

Object Detection network to detect license plates in an image of a car.

View on NGC

LPRNet

Recognize characters from an image of a car license plate.

View on NGC

Vehicle Detection & Classification Models

TrafficCamNet

A four class object detection network to detect cars and other objects in an image.

View on NGC

DashCamNet

A four class object detection network to detect cars and other objects in an image. This network is targeted for detection objects from moving camera.

View on NGC

VehicleMakeNet

Classify cars into 1 of 20 popular car brands such as Acura, Audi, BMW, Chevrolet, Chrysler, Dodge, Ford, GMC, Honda, Hyundai, Infiniti, Jeep, Kia, Lexus, Mazda, Mercedes, Nissan, Subaru, Toyota, and Volkswagen.

View on NGC

VehicleTypeNet

Classify type of a vehicle into coupe, sedan, SUV, van, large vehicle and truck.

View on NGC

Gaze Estimation Models

Gaze Estimation

Detect a person's eye gaze point of regard and gaze vector.

View on NGC

Facial Landmark Models

Facial Landmarks Estimation

Detect fiducial keypoints from an image of a face.

View on NGC

Heart Rate Estimation Models

HeartRateNet

Estimate a person's heart-rate non-invasively from RGB facial videos.

View on NGC

Human Gestures and Emotion Models

EmotionNet

Network to classify emotions from face.

View on NGC

GestureNet

Classify gestures from hand crop images.

View on NGC

Segmentation Models

Instance Segmentation - MaskRCNN

View on NGC

Semantic Segmentation - UNET

View on NGC

Text Recognition Models

Text Recognition

Recognize characters from an image of a car license plate.

View on NGC

Conversational AI Pre-Trained Models

Use state-of-the-art deep learning models trained for more than 100,000 hours on NVIDIA DGX™ systems for speech, language understanding, and vision tasks. You can fine-tune these models for your domain with your data using Transfer Learning Toolkit before easily deploying them as services.

speech recognition

Speech Recognition (ASR)

Automatic speech recognition (ASR) takes human voice as input and converts it into readable text.

nlp

Natural Language Processing (NLP)

Natural language understanding (NLU) takes text as input, understands context and intent, and uses it to generate an intelligent response.

Speech Recognition (ASR) Models

Jasper

An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.

View on NGC

QuartzNet

An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.

View on NGC

Natural Language Processing (NLP) Models

BERT Text Classification

This model classifies documents into predefined categories.

View on NGC

BERT NER

Takes a piece of text as input and for each word in the text, the model identifies a category the word belongs to.

View on NGC

BERT Punctuation

Predicts a punctuation mark that should follow the word (if any) and predicts if the word should be capitalized or not.

View on NGC

BERT Intent and Slot

Classifies Intent and detects all relevant slots (Entities) for this Intent in a query.

View on NGC

Question Answering Bert Large

Bert Large Uncased model for extractive question answering on any provided content.

View on NGC

Question Answering Bert Base

Bert Uncased model for extractive question answering on any provided content.

View on NGC

Achieve State-of-the-art Accuracy Using Model Architectures

TLT adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for edge deployment.

Image Classification
Object Detection
Segmentation
DetectNet_V2
FasterRCNN
SSD
YOLOV3
YOLOV4
RetinaNet
DSSD
MaskRCNN
UNET
ResNet
10/18/34/50/101

VGG16/19

GoogLeNet

MobileNet V1/V2

SqueezeNet

DarkNet 19/53

CSPDarkNet 19/53

EfficientNet




Deploy State-of-the-Art AI Models

Faster Inference Using Model Pruning & Quantization-Aware Training

Companies building AI solutions are in need of highly accurate AI models that can efficiently predict, achieve faster inference with tight memory constraints. Unpruned AI models in many cases are not optimized for low power devices. If you are solving a problem with a limited dataset, transfer learning along with select pruning improves channel density for high throughput inference.

Learn More

Powerful End-to-End AI Systems

Typically AI models when represented in lower precision make them compute-efficient. INT8 precision AI models are significantly faster than running inference in floating point , quantizing FP32/16 weights to INT8 post-training can reduce model accuracy due to quantization errors. With the quantization-aware training feature of TLT, quantization of weights in the training step helps produce comparable accuracy as FP16/FP32 models versus quantization post-training. With Quantize Aware Training (QAT) in TLT, developers can achieve upto 2X inference speedup while maintaining comparable accuracy to FP16.

Learn More

DeepStream SDK

Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using DeepStream SDK and Transfer Learning Toolkit.

The production ready AI models produced by TLT can be easily integrated with NVIDIA DeepStream and TensorRT for high throughput inference and enables you to unlock performance for a variety of applications including smart cities and hospitals, industrial inspection, logistics, traffic monitoring, retail analytics etc.

TLT pruning improves channel density for high throughput inference.

Learn More

Jarvis

Jarvis is a fully accelerated application framework for developers building and deploying multimodal conversational AI services that uses state-of-the-art deep learning models and end-to-end deep learning pipeline. Developers at enterprises can easily fine-tune state-of-art-models on their data using Transfer Learning Toolkit to achieve a deeper understanding of their specific context. Jarvis provides tools to deploy these highly accurate models as optimized end-to-end services as real-time Jarvis services. and optimize for inference to offer end-to-end real-time services that can run the entire conversation pipeline in less than 300 milliseconds (ms) versus several seconds needed by CPUs. and delivers 7x higher throughput on GPUs compared with CPUs.

Learn More
Jetson Nano
Jetson Xavier NX
Jetson AGX Xavier
T4
Model Architecture
Inference Resolution
Precision
Model Accuracy
GPU (FPS)*
GPU (FPS)
DLA1 (FPS)
DLA2 (FPS)
GPU (FPS)
DLA1 (FPS)
DLA2 (FPS)
GPU (FPS)
PeopleNet-ResNet18
960x544
INT8
80%
14
218
72
72
384
94
94
1105
PeopleNet-ResNet34
960x544
INT8
84%
10
157
51
51
272
67
67
807
TrafficCamNet-ResNet18
960x544
INT8
84%
19
261
105
105
464
140
140
1300
DashCamNet-ResNet18
960x544
INT8
80%
18
252
102
102
442
133
133
1280
FaceDetect-IR-ResNet18
384x240
INT8
96%
95
1188
570
570
2006
750
750
2520
VehicleTypeNet - ResNet18 ⊺
224x224
INT8
96%
120
1333
678
678
3047
906
906
11918
VehicleMakeNet - ResNet18 ⊺
224x224
INT8
91%
173
1871
700
700
3855
945
945
15743

Greater end-to-end throughput using Transfer Learning Toolkit and DeepStream SDK
* FP16 inference on Jetson Nano
⊺ Throughput measured using trtexec and does not reflect end-to-end performance

For Jarvis Models




Testimonials


Using NVIDIA’S TLT made training a real time car detector and license plate detector easy. It eliminated our need to build models from the ground up, resulting in faster development of models and ability to explore options.


Booz Allen Hamilton

SmartCow is building turnkey AIoT solutions to optimize turnaround time at ports and dry docks. By using TLT, we were able to reduce the training iterations by 9x and reduce the data collection and labeling effort by 5x which significantly reduces our training cost by 2x

SmartCow


General FAQ

Yes, TLT models are free for commercial use. For specific licensing terms, refer to model EULA.

TLT uses TensorFlow and Keras framework completely abstracted away from the user. Users operate TLT through documented spec files and do not have to learn about DL framework.

Pull the TLT container from NGC. The container comes pre-packaged with Jupyter notebooks and sample spec files for various network architectures. Additional technical resources can be found here.

No third party pre-trained models are supported by TLT. Only NVIDIA pre-trained models from NGC are currently supported.

Training with TLT is only on x86 with NVIDIA GPU such as a V100. Models trained with TLT can be deployed on any NVIDIA platform including Jetson.

To deploy trained models on DeepStream, refer to Deploying to DeepStream chapter of TLT Getting started guide.

The purpose-built models can be used as is out of box and can also be re-trained with your dataset. The architecture specific models for detection, segmentation, and classification are required to be re-trained with TLT.

We will enable support for speech and NLU models with TLT in early 2021. Sign up to be notified on the next version of TLT availability here.

Latest Product News

tutorial

Developer Tutorial

Learn how to train a 90-class COCO MaskRCNN model with TLT and deploy it on Deepstream using TensorRT

Try Today
dev tutorial




Developer Tutorial

Learn how to train an AI model with Quantization Aware Training in Transfer Learning Toolkit.

Read Now
gtc

NVIDIA GTC

BMW research group showcases use of NVIDIA ISAAC SDK and TLT for building smart transport robots.

Learn More
community

Community Projects

Learn something new or build your own project. See projects built by our developer community.

Submit A Project