NVIDIA TAO
NVIDIA TAO is a framework for customizing vision foundation models for high accuracy and performance with fine-tuning microservices. TAO’s suite of modular microservices helps you easily adapt and optimize vision AI models for specific domains or tasks. This dramatically reduces the time and data you need to build high-performing AI solutions that are ready for deployment from the edge to the cloud. 
At the heart of TAO is a collection of vision foundation models, multimodal models, and pre-trained vision models built on vast, commercially relevant datasets. Applicable across various industries, TAO excels at delivering custom industrial AI models for visual inspection, quality control, and robotic guidance.
How TAO Works
The NVIDIA TAO workflow shows how developers can go from model training to production deployment in a seamless pipeline. The process begins with selecting a pretrained foundation model from the TAO model zoo or bringing a third-party model with an architecture supported by TAO. Next, developers adapt the model to their domain by fine-tuning and optimizing it to be smaller and faster at runtime. Finally, the trained models can be exported into open formats for deployment across diverse environments—from edge to cloud—with NVIDIA DeepStream Inference Builder. This structured workflow ensures that high-performing AI models can be quickly customized, optimized, and deployed at scale.
TAO Documentation
Browse documentation and learn how to get started on TAO. 
Key Features
Scale Custom Model Development With New Vision Foundation Models
Use high-performance vision foundation models as a general-purpose starting point for developing a variety of downstream vision tasks, like classification, detection, segmentation, and more. You can customize models for domain or task-specific vision applications across industries based on training data availability and performance requirements.
Achieve High Accuracy With Advanced Training Techniques
Apply advanced training and fine-tuning capabilities, including self-supervised learning (SSL), to learn from unlabeled, unstructured data. This accelerates training time and reduces annotation costs. Plus, post-train third-party models with an architecture supported by TAO.
Increase Inference Throughput With Knowledge Distillation
Use knowledge distillation to compress large models into efficient, edge-ready versions with minimal reduction in accuracy. 
Reduce Data Preparation Times With TAO Data Services
Manage, process, and prepare datasets for AI model training with services that streamline the data pipeline process with tools for data ingestion, auto-labeling, and conversion to formats optimized for NVIDIA TAO.
Deploy Anywhere, Run Efficiently
With fine-tuning microservices (FTMS) and DeepStream Inference Builder, TAO standardizes training and deployment for all supported models for inference on edge or cloud. It offers training job orchestration, boosts status monitoring, and automatically searches for the best hyperparameters with AutoML. 
TAO Models
Vision Foundation Models
Use vision foundation models (VFMs) as pretrained starting points, making it easy to fine-tune models for domain-specific tasks and deploy them efficiently at scale.
Pre-Trained Vision Models
Easily combine pretrained vision models with a foundation model for tasks like detection, segmentation, classification, and change detection, streamlining domain-specific customization.
- Real-time detection (RT-DETR) 
- Text prompt-based segmentation (SegFormer) 
- Visual change detection (Visual ChangeNet) 
Depth Estimation Models
Use mono and stereo depth estimation foundation models to achieve strong zero-shot generalization.
Multimodal Vision Models
Use multimodal vision models to combine vision (image and video) data with text to perform tasks like feature extraction, detection, or segmentation
Get Started With TAO
Set Up Your System
Check to see if your machine meets the system requirements and compatibility, then get started by installing TAO.
TAO Github Tutorials and Notebooks
Check out extended resources and Jupyter notebooks for TAO.
Learn MorePerformance
Unlock peak inference performance with NVIDIA pretrained models across platforms—from the edge with NVIDIA Jetson™ solutions to the cloud featuring NVIDIA Ampere architecture GPUs. For more details on batch size and other models, check the detailed performance datasheet.
| Model Arch | Model Variant | Inference Resolution | Precision | NVIDIA DGX Spark | NVIDIA Jetson AGX Thor™ | NVIDIA L40s | NVIDIA A100 | RTX PRO 6000 SE | NVIDIA H200 | NVIDIA B200 | NVIDIA HGX™ GB200 | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| C-RADIOv2 Classification | Large 322M | 3x224x224 | FP16 | 297 | 635 | 1453 | 1520 | 2443 | 3579 | 5781 | 6018 | 
| NV-DINOv2 | Large 305M | 3x224x224 | FP16 | 207 | 413 | 1020 | 1048 | 1747 | 2542 | 5667 | 5957 | 
| RT-DETR+C-RADIOv2 | Base 147M | 3x640x640 | FP16 | 204 | 248 | 670 | Awaiting Results | 1204 | 1934 | 3316 | Awaiting Results | 
| SegFormer+C-RADIOv2 | Base 92M | 3x640x640 | FP16 | 254 | 264 | 1155 | 1330 | 1960 | 2746 | 3187 | 3386 | 
| Multi-Golden ChangeNet Classification+C-RADIOv2 | Base | 3x224x224 | FP16 | Awaiting Results | Awaiting Results | 332 | 418 | 525 | 847 | 820 | 867 | 
| NV-DepthAnythingv2 | Large 360M | 3x518x924 | FP32+FP16 | Awaiting Results | 25 | 66 | 70 | 108 | 176 | 320 | 320 | 
| C-FoundationStereo | Small 221M | 2x3x320x736 | FP16 | 2.3 | 1.5 | 1.0 | Awaiting Results | 18 | 20 | 19 | Awaiting Results | 
Starter Kits
Accelerated Computing Hub 
Visit the Accelerated Computing Hub to see examples of CUDA in action  in C++ and Python.  You’ll find tutorials and example code that will help you learn more about how to use CUDA.
- Download C-RADIO v2 
- Download NV-DINO v2 
Fine-Tuning 
Take advantage of Supervised Fine-Tuning (SFT) with labeled data and Self-Supervised Learning (SSL) with unlabeled data.
Model Distillation
Distill knowledge from a larger teacher model into a smaller student model for target compute.
AI-Assisted Auto Labeling 
Use prompts and descriptors to auto-label object detection and segmentation masks.
Depth Estimation 
Access the highest-accuracy depth estimation models.
- Download FoundationStereo 
- Download NVDepthAnythingv2 
Model Deployment 
Optimize inference with the DeepStream SDK.
- Deploy With DeepStream Inference Builder 
More Resources
Ethical AI
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. 
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.
Get Started Today.