NVIDIA TAO Toolkit

Now, you can boost your AI development by 10X, without a huge investment in AI expertise. The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit gives you a faster, easier way to accelerate training and quickly create highly accurate and performant, domain-specific AI models.


Get Started

NVIDIA TAO Toolkit was previously named NVIDIA Transfer Learning Toolkit.


Creating an AI/ML model from scratch to solve a business problem can cost you a lot of time and money. Transfer learning is a popular technique that can be used to extract learned features from an existing neural network model to a new one. The NVIDIA TAO Toolkit simplifies this process by abstracting away the AI/DL framework complexity. This accelerates development time by enabling developers to fine-tune on high quality NVIDIA pre-trained models with only a fraction of the data as training from scratch.

With the TAO Toolkit, you can use NVIDIA’s production quality pre-trained models and deploy as-is or apply minimal fine-tuning for various computer vision and conversational AI use-cases. The TAO Toolkit is a core component of NVIDIA's TAO platform for AI-model-adaptation.

embedded-diagram-tlt3.0-launch-software-stack-diagram-1571862-01.png

Easier Training

Add state of the art AI to your application with minimal coding. No AI frameworks expertise needed.

Highly Accurate AI

Remove barriers and unlock higher network accuracy by using purpose-built pre-trained models

Greater Throughput

Reduce deployment costs significantly and achieve high inference throughput.

Train Anywhere

The TAO Toolkit works seamlessly on any infrastructure where your data and compute are located.



Quickly Create Customized, Production Ready Models For Computer Vision & Conversational AI

Avoid the time consuming process of creating and optimizing models from scratch or using unoptimized open source models and spend more time enhancing your application. The TAO Toolkit speeds up engineering efforts by over 10x using the NVIDIA production quality pre-trained models to achieve high throughput and accuracy in a fraction of the time. These AI models are free and readily available for download from the NGC catalog.

10x


Pre-trained Models for Common AI Tasks

Computer Vision Pre-Trained Models

You can jumpstart your AI project by using NVIDIA pretrained models already built for a variety of these industry use-cases, speeding up your Proof of Concept (PoC) to production process. The AI models can be readily used for common computer vision use-cases such as counting and detecting people in crowded spaces, detecting and classifying vehicles, license plate detection and recognition at a toll booth, parking management, heart rate monitoring for patients at a healthcare facility and more.

Facial Landmarks Estimation

PeopleNet

Pose Estimation

License Plate Detection and Recognition

people detection

People Detection

Detect person, bags and face in crowded spaces such as transport hubs, improve customer experiences, analyze pedestrian foot traffic and more.

lpd recognition

License Plate Detection & Recognition

Detect and identifies vehicle license plates for various applications including parking enforcement, automated toll booth, traffic monitoring, etc.

vehicle detection

Vehicle Detection & Classification

Detect type of vehicle or make/model of cars for smart city applications

People Detection Models

PeopleNet

3 class object detection network to detect people in an image.

View on NGC

PeopleSegNet

1 class instance segmentation network to detect and segment instances of people in an image.

View on NGC

PeopleSemSegNet

1 class semantic segmentation network to segment people from background.

View on NGC

FaceDetect

Detect faces from an image.

View on NGC

FaceDetect-IR

1-class object detection network to detect faces in an IR image.

View on NGC

License Plate Detection & Recognition Models

LPDNet

Object Detection network to detect license plates in an image of a car.

View on NGC

LPRNet

Recognize characters from an image of a car license plate.

View on NGC

Vehicle Detection & Classification Models

TrafficCamNet

A four class object detection network to detect cars and other objects in an image.

View on NGC

DashCamNet

A four class object detection network to detect cars and other objects in an image. This network is targeted for detection objects from moving camera.

View on NGC

VehicleMakeNet

Classify cars into 1 of 20 popular car brands such as Acura, Audi, BMW, Chevrolet, Chrysler, Dodge, Ford, GMC, Honda, Hyundai, Infiniti, Jeep, Kia, Lexus, Mazda, Mercedes, Nissan, Subaru, Toyota, and Volkswagen.

View on NGC

VehicleTypeNet

Classify type of a vehicle into coupe, sedan, SUV, van, large vehicle and truck.

View on NGC
pose estimation

Pose Estimation

Identify key joints on person's body.

gaze estimation

Gaze Estimation

Estimates where a person is looking at with 3d line of sight.

facial landmark

Facial Landmark

Detect key landmarks on the face and track them for shape prediction, localizing the face in the image etc.

Pose Estimation Models

Pose Estimation

Identify key joints on person's body.

View on NGC

Gaze Estimation Models

Gaze Estimation

Detect a person's eye gaze point of regard and gaze vector.

View on NGC

Facial Landmark Models

Facial Landmarks Estimation

Detect fiducial keypoints from an image of a face.

View on NGC
heart rate estimation

Heart Rate Estimation

Estimates heart rate using Computer vision for applications in healthcare and patient monitoring.

human gesture

Human Gestures and Emotion

Computer vision tasks for detecting various hand gestures and emotion.

segmentation

Segmentation

Identify each instance of multiple objects in a frame at the pixel level.

Heart Rate Estimation Models

HeartRateNet

Estimate a person's heart-rate non-invasively from RGB facial videos.

View on NGC

Human Gestures and Emotion Models

EmotionNet

Network to classify emotions from face.

View on NGC

GestureNet

Classify gestures from hand crop images.

View on NGC

Segmentation Models

Instance Segmentation - MaskRCNN

Produce bounding boxes around the object and segmentation masks

View on NGC

Semantic Segmentation - UNET

Perform image classification at pixel level. Assign every pixel in an image to a class label. Clubs all instances of a class to the same label.

View on NGC

PeopleSegNet

1 class instance segmentation network to detect and segment instances of people in an image.

View on NGC

PeopleSemSegNet

1 class semantic segmentation network to segment people from background.

View on NGC
text recognition

Text Recognition

Recognizes text from an image.

object detection

Object Detection

Detect one or multiple objects in a frame and place bounding boxes around the object.

image-classification

Image Classification

Easily classify images into designated classes based on the image features. Supported network architectures: ResNet, GoogLeNet, EfficientNet, VGG, DarkNet, MobileNet and CSPDarkNet.

Text Recognition Models

Text Recognition

Recognize characters from an image of a car license plate.

View on NGC

Object Detection Models

DetectNet_v2

DetectNet_v2 is NVIDIA optimized object detection architecture to achieve high performance.

View on NGC

YOLOv3, YOLOv4, FasterRCNN, SSD/DSSD, RetinaNet

Open model architectures optimized for performance on NVIDIA GPUs.

View on NGC

Image Classification Models

Easily classify images into designated classes based on the image features. Supported network architectures: ResNet, GoogLeNet, EfficientNet, VGG, DarkNet, MobileNet and CSPDarkNet.

View on NGC

Achieve State-of-the-art Accuracy Using Model Architectures

The TAO Toolkit, lets you bring your own data to fine-tune the model for a specific use-case using 100+ permutations of neural network architectures such as ResNet, VGG, FasterRCNN, RetinaNet, and YOLOv3/v4. Or, you can use one of NVIDIA’s multi-purpose, production-quality models for common AI tasks instead of going through the hassle of training from scratch.

Image Classification
Object Detection
Segmentation
DetectNet_V2
FasterRCNN
SSD
YOLOV3
YOLOV4
RetinaNet
DSSD
MaskRCNN
UNET
ResNet
10/18/34/50/101

VGG16/19

GoogLeNet

MobileNet V1/V2

SqueezeNet

DarkNet 19/53

CSPDarkNet 19/53

EfficientNet B0/B1

TAO Toolkit adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for high throughput inference.



Deploy State-of-the-Art AI Models

Faster Inference Using Model Pruning & Quantization-Aware Training





Companies building AI solutions are in need of highly accurate AI models that can efficiently make predictions while achieving faster inference within tight memory constraints. Unpruned AI models, in many computer vision use-cases, are not optimized for low power devices. If you are solving a problem with a limited dataset, transfer learning along with select pruning improves channel density for high throughput inference.

Learn More


Typically AI models when executed in lower precision are more compute-efficient. INT8 precision AI models are significantly faster than running inference in floating point, quantizing FP32/16 weights to INT8 post-training can reduce model accuracy due to quantization errors in some cases. With the Quantization-Aware Training (QAT) feature of TAO Toolkit, quantization of weights in the training step helps produce comparable accuracy as FP16/FP32 models versus post-training quantization. With QAT in TAO Toolkit, developers can achieve upto 2X inference speedup using INT8 precision while maintaining accuracy comparable to FP16.

Learn More

Nano
TX2 NX
Xavier NX
AGX Xavier
T4
A100
Model Arch
Inference Resolution
Precision
Model Accuracy
GPU (FPS)*
GPU (FPS)*
GPU (FPS)
DLA1 (FPS)
DLA2 (FPS)
GPU (FPS)
DLA1 (FPS)
DLA2 (FPS)
GPU (FPS)
GPU (FPS)
PeopleNet-ResNet34
960x544x3
INT8
84% mAP
11
31
182
58
58
314
75
75
1043
6001
TrafficCamNet
960x544x3
INT8
84% mAP
19
51
264
105
105
478
140
140
1703
9520
LPD
640x480x3
INT8
98% mAP
66
178
770
194
194
1370
256
256
5921
21931
Facial Landmark
80x80x1
FP16
6.1 pixel error
125
319
747
-
-
1451
-
-
4735
23117
GazeNet
224x224x1
224x224x1
224x224x1
25x25x1
FP16
6.5 RMSE
98
280
923
-
-
1627
-
-
5219
26534
People Semantic Segmentation
960x544x3
INT8
92% MIOU
1.4
6
17
9
9
28
12
12
103
519
2D Body Pose Estimation
288x384x3
INT8
56% mAP
5
12
97
-
-
166
-
-
563
2686

Unlock peak inference performance with NVIDIA pre-trained models across NVIDIA platforms - Jetson Nano, TX2 NX, AGX Xavier, T4 and Ampere A100 GPUs. For more details on batch size and other models, check the detailed performance datasheet.

Note: * FP16 inference on Jetson Nano and TX2 NX



Powerful End-to-End Vision AI Pipeline Using DeepStream SDK







Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using DeepStream SDK and TAO Toolkit. The production ready AI models produced by TAO Toolkit can be easily integrated with NVIDIA DeepStream SDK and TensorRT for high throughput inference and enabling you to unlock greater performance for a variety of applications including smart cities and hospitals, industrial inspection, logistics, traffic monitoring, retail analytics etc.

Learn More

Unlock highest stream density and deploy at scale using DeepStream SDK

Conversational AI Pre-Trained Models

TAO Toolkit for Conversational AI includes support for Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) use-cases. You can now easily design personalized real-time call center experiences, smart kiosks, high quality services for- intent recognition, entity recognition, sentiment analysis and more using readily available pre-trained models from NGC.

speech recognition

Speech Recognition (ASR)

Automatic speech recognition (ASR) takes human voice as input and converts it into readable text.

nlp

Natural Language Processing (NLP)

Natural language understanding (NLU) takes text as input, understands context and intent, and uses it to generate an intelligent response.

Speech Recognition (ASR) Models

Jasper

An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.

View on NGC

QuartzNet

An end-to-end neural automatic speech recognition (ASR) model that transcribes segments of audio to text.

View on NGC

CitriNet

An optimized and smaller version of QuartzNet for end-to-end automatic speech recognition (ASR) tasks.

View on NGC

N-gram Language Model

Estimates the probablility distribution of sequence of words used in conjunction with the ASR model to formulate the probabilities of a word in a sentence.

View on NGC

Natural Language Processing (NLP) Models

BERT Text Classification

This model classifies documents into predefined categories.

View on NGC

BERT NER

Takes a piece of text as input and for each word in the text, the model identifies a category the word belongs to.

View on NGC

BERT Punctuation

Predicts a punctuation mark that should follow the word (if any) and predicts if the word should be capitalized or not.

View on NGC

BERT Intent and Slot

Classifies Intent and detects all relevant slots (Entities) for this Intent in a query.

View on NGC

Question Answering Bert Large

Bert Large Uncased model for extractive question answering on any provided content.

View on NGC

Question Answering Bert Base

Bert Uncased model for extractive question answering on any provided content.

View on NGC

Question Answering Megatron

Megatron Uncased model for Question Answering trained on the question answering dataset SQuADv2.0.

View on NGC


Deploy State-of-the-Art Conversational AI Models

Powerful End-to-End AI Pipeline Using Riva




Riva is a fully accelerated application framework for developers building and deploying multimodal conversational AI services using state-of-the-art deep learning models. Developers at enterprises can easily fine-tune state-of-art-models on their data using TAO Toolkit to achieve higher accuracy of their specific context. Using optimized pretrained models and transfer learning you can train and deploy applications using just 1/10th data versus manual non-transfer learning based approaches.

Learn More
Tao ConvAI Workflow Diagram 1920x800.jpg

Train and deploy end-to-end conversational AI pipeline using Pretrained Models, TAO Toolkit and Riva




Data Generation & Data Annotation Partners

Training AI requires lots of high quality labeled data and we have partnered with several companies to bring data creation and annotation to accelerate training.

Sama logo

Offers end-to-end synthetic computer vision solutions for object detection and image classification

Learn More
Hasty.ai logo

Annotation solution using AI to significantly speed up labeling.

Learn More
Sky Engine logo

Next generation self learning AI system for image and video analysis applications.

Learn More
Lightly logo

Offers a data curation platform that helps you select the best data for your use case

Learn More
Appen logo

Provides high-quality training data for a variety of use cases

Labelbox logo

Collaborative data training platform to create and manage labeled data for machine learning applications

Sama logo

Provides high quality training data, validation and annotation solutions for AI and machine learning models



Testimonials


“INEX RoadView, our comprehensive automatic license plate recognition system for toll roads, uses NVIDIA’s end-to-end vision AI pipeline, production ready AI models, TAO Toolkit, and DeepStream SDK. Our engineering team not only slashed the development time by 60% but they also reduced the camera hardware cost by 40% using Jetson Nano and Xavier NX. This enabled our vendors to deploy RoadView, the only out of the box ALPR solution, quickly and reliably. For us, nothing else came close.”


INEX

"KION Group is working on robust AI-based distribution autonomy solutions across its brands, to address operational needs and logistics optimization challenges and greatly reduce flow exception events. Innovation, engineering and digital transformation services are benefiting from optimized NVIDIA pre-trained models while rapidly innovating and fine-tuning models on the fly using TAO Toolkit and deploying with Nvidia Deepstream unlocking multi-stream density with Jetson platforms."


KION

"At Quantiphi, we use NVIDIA SDKs to build real-time video analytics workflows for many of our Fortune 500 customers across Retail and Media & Entertainment. TAO Toolkit provides an efficient way to customize training and model pruning for faster edge inference. DeepStream allows us to build high throughput inference pipelines on the Cloud and easily port them to the Jetson NX devices."


Quantiphi

"We are enabling developers and third-party vendors to readily build intelligent AI apps leveraging Optra’s skills marketplace. As a new entrant to the Edge AI market, being able to differentiate our offerings and time to market was crucial. Readily available MaskRCNN from the TAO Toolkit and easy integration into DeepStream saved 25% development effort right out of the box for our R&D team."


Lexmark Ventures

"Using NVIDIA’S TAO Toolkit made training a real time car detector and license plate detector easy. It eliminated our need to build models from the ground up, resulting in faster development of models and ability to explore options."


Booz Allen Hamilton

"SmartCow is building turnkey AIoT solutions to optimize turnaround time at ports and dry docks. By using the TAO Toolkit, we were able to reduce the training iterations by 9x and reduce the data collection and labeling effort by 5x which significantly reduces our training cost by 2x"


SmartCow

“CVEDIA’s synthetic algorithm technology accelerates development of object detection and image classification networks. By using NVIDIA's TAO Toolkit, we cut model training time in half and achieved the same level of model accuracy and throughput performance”


CVEDIA



General FAQ

Transfer Learning Toolkit has been rebranded to TAO Toolkit. All the functionality and models from TLT will continue to work with TAO Toolkit.
Yes, TAO Toolkit models are free for commercial use. For specific licensing terms, refer to model EULA.
TAO Toolkit uses TensorFlow and PyTorch framework completely abstracted away from the user. Users operate TAO Toolkit through documented spec files and do not have to learn about DL framework.
Getting started with TAO Toolkit is very easy. Here’s a TAO Toolkit getting started guide. In addition you can find Jupyter notebooks for all vision models under NGC resources. For more information about TAO Toolkit check out TAO-Toolkit-CV collection and TAO-Toolkit-Conversational AI collection.
No third party pre-trained models are supported by TAO Toolkit. Only NVIDIA pre-trained models from NGC are currently supported.
Training with TAO Toolkit is only on x86 with NVIDIA GPU such as a V100. Models trained with TAO Toolkit can be deployed on any NVIDIA platform including Jetson.
To deploy trained models on DeepStream, refer to Deploying to DeepStream chapter of TAO Toolkit Getting started guide.
The purpose-built models can be used as is out of box and can also be re-trained with your dataset. The architecture specific models for detection, segmentation, and classification are required to be re-trained with TAO Toolkit.
NVIDIA Train, Adapt, and Optimize (TAO) is an AI-model-adaptation platform that simplifies and accelerates the creation of enterprise AI applications and services. By fine-tuning pretrained models with custom data through a UI-based, guided workflow, you can produce highly accurate computer vision, speech, and language understanding models in hours rather than months, eliminating the need for large training runs and deep AI expertise
TAO Toolkit along with other technologies such as Federated Learning, TensorRT etc. is a core part of the TAO platform. The TAO platform allows users to train, adapt and optimize their model through a simple to use UI, powered by a guided workflow. TAO Toolkit, on the other hand, is a stand alone product where users can optimize models in the TAO Toolkit environment they are familiar with using the command line interface.
TAO Toolkit will continue to be developed and supported as a standalone product.
The early access program for NVIDIA TAO is currently active. This is a great opportunity for you to work closely with our product team to shape the product. You can sign up for early access here.

Latest Product News

GTC

Developer Tutorial

Learn how to train and optimize pose estimation model for real-time inference.

Read Blog - Part 1 Read Blog - Part 2
Dev Tutorial

Developer Tutorial

Learn how to create a real-time number plate detection and recognition app.

Read Blog
tutorial

Supercharge Your AI Workflows With Transfer Learning

Learn how NVIDIA TAO Toolkit and Pretrained Models can transform your development efforts.

Read Whitepaper
TAO

Explore TAO Platform

NVIDIA TAO is an AI-model-adaptation platform that simplifies and accelerates the creation of enterprise AI.

Learn More

GTC21 Talk

GTC21 Talk

Learn how the world’s top AI teams combine pre-trained models and transfer learning tools to supercharge their AI vision development.

Watch Now
GTC21 Talk

GTC21 Talk

Learn how to build and deploy a custom conversational AI app with NVIDIA TAO Toolkit and Riva

Watch Now
Success Story

Success Story

Learn how Lexmark uses pre-trained models, TAO Toolkit ,and DeepStream to reduce AI skills design cycle by 25%.

Read Blog
Success Story

Success Story

Explore how INEX leverages pre-trained models, TAO Toolkit ,and DeepStream to reduce development time and cost for toll road systems.

Read More