Build a Real-Time Visual Inspection Pipeline with NVIDIA TAO 6 and NVIDIA DeepStream 8

Building a robust visual inspection pipeline for defect detection and quality control is not easy. Manufacturers and developers often face challenges such as customizing general-purpose vision AI models for specialized domains, optimizing the model size on compute‑constrained edge devices, and deploying in real time for maximum inference throughput.

NVIDIA Metropolis is a development platform for vision AI agents and applications that helps to solve these challenges. Metropolis provides the models and tools to build visual inspection workflows spanning multiple stages, including:

Customizing vision foundation models through fine-tuning
Optimizing the models for real‑time inference
Deploying the models into production pipelines

NVIDIA Metropolis provides a unified framework and includes NVIDIA TAO 6 for training and optimizing vision AI foundation models, and NVIDIA DeepStream 8, an end-to-end streaming analytics toolkit. NVIDIA TAO 6 and NVIDIA DeepStream 8 are now available for download. Learn more about the latest feature updates in the NVIDIA TAO documentation and NVIDIA DeepStream documentation.

This post walks you through how to build an end-to-end real-time visual inspection pipeline using NVIDIA TAO and NVIDIA DeepStream. The steps include:

Performing self-supervised fine-tuning with TAO to leverage domain-specific unlabeled data.
Optimizing foundation models using TAO knowledge distillation for better throughput and efficiency.
Deploying using DeepStream Inference Builder, a low‑code tool that turns model ideas into production-ready , standalone applications or deployable microservices.

How to scale custom model development with vision foundation models using NVIDIA TAO

NVIDIA TAO supports the end-to-end workflow for training, adapting, and optimizing large vision foundation models for domain specific use cases. It’s a framework for customizing vision foundation models to achieve high accuracy and performance with fine-tuning microservices.

Flow diagram showing an overview of the end-to-end scope of NVIDIA TAO. — *Figure 1. Use NVIDIA TAO to create highly accurate, customized, and enterprise-ready AI models to power your vision AI applications*

Vision foundation models (VFMs) are large-scale neural networks trained on massively diverse datasets to capture generalized and powerful visual feature representations. This generalization makes them a flexible model backbone for a wide variety of AI perception tasks such as image classification, object detection, and semantic segmentation.

TAO provides a collection of these powerful foundation backbones and task heads to fine-tune models for your key workloads like industrial visual inspection. The two key foundation backbones in TAO 6 are C-RADIOv2 (highest out-of-the-box accuracy) and NV-DINOv2. TAO also supports third-party models, provided their vision backbone and task head architectures are compatible with TAO.

The diagram shows the TAO fine-tuning workflow. It starts with a foundation backbone that learns image features from your dataset, followed by task head layers (classification, detection, segmentation) that use these feature maps to generate final predictions. — *Figure 2. Scale custom vision model development with NVIDIA TAO fine-tuning framework, foundation model backbones, and task heads*

To boost model accuracy, TAO supports multiple model customization techniques such as supervised fine-tuning (SFT) and self-supervised learning (SSL). SFT requires collecting annotated datasets that are curated for the specific computer vision downstream tasks. Collecting high-quality labeled data is a complex, manual process that is time-consuming and expensive.

Second, NVIDIA TAO 6 empowers you to leverage self-supervised learning to tap into the vast potential of unlabeled images to accelerate the model customization process where labeled data is scarce or expensive to acquire.

This approach, also called domain adaption, enables you to build a robust foundation model backbone such as NV-DINOv2 with unlabeled data. This can then be combined with a task head and fine-tuned for various downstream inspection tasks with a smaller annotated dataset.

In practical scenarios, this workflow means a model can learn the nuanced characteristics of defects from plentiful unlabeled images, then sharpen its decision-making with targeted supervised fine-tuning, delivering state-of-the-art performance even on customized, real-world datasets.

A diagram showing the two stages to effectively adapt and finetune a large scale trained foundation model to a specific downstream task. — *Figure 3. End-to-end workflow to adapt a foundation model for a specific downstream use case*

Boosting PCB defect detection accuracy with foundation model fine-tuning

To provide an example, we applied the TAO foundation model adaptation workflow using large-scale unlabeled printed circuit board (PCB) images to fine-tune a vision foundation model for defect detection. Starting with NV-DINOv2, a general-purpose model trained on 700 million general images, we customized it with SSL for PCB applications with a dataset of ~700,000 unlabeled PCB images. This helped transition the model from broad generalization, to sharp domain-specific proficiency.

Once domain adaptation is complete, we leveraged an annotated PCB dataset, using linear probing to refine the task-specific head for accuracy, and full fine-tuning to further adjust both backbone and a classification head. This first dataset consisted of around 600 training and 400 testing samples, categorizing images as OK or Defect (including patterns such as missing, shifts, upside-down, poor soldering, and foreign objects).

Feature maps show that the adapted NV-DINOv2 can sharply distinguish components and foreground-background (Figures 4 and 5) even before downstream fine-tuning. It excels in separating complex items like integrated circuit (IC) pins from the background—a task that’s not possible with a general model.

Two side-by-side images comparing the features from a generic NV-DINOv2 model (versus a domain adapted NV-DINOv2 model when computed for an PCB image for the OK class. — *Figure 4. A comparison of feature maps for the OK class using the domain-adapted NV-DINOv2 (left) and the general NV-DINOv2 (right)*

This results in substantial classification accuracy improvements of 4.7% from 93.8% to 98.5%.

Plot showing evolution of accuracy over the number of epochs during training when starting from a generic NV-DINOv2 vs an NV-DINOv2 checkpoint that’s domain adapted on unlabeled images. — *Figure 6. Accuracy comparison between the domain-adapted and generic NV-DINOv2*

The domain-adapted NV-DINOv2 also shows strong visual understanding and extracting relevant image features within the same domain. This indicates that similar or better accuracy can be achieved using less labeled data with downstream supervised fine-tuning.

In certain scenarios, gathering such a substantial amount of data with 0.7 million unlabeled images could still be challenging. However, you could still benefit from NV-DINOv2 domain adaptation even with a smaller dataset.

Figure 7 shows the results of running an experiment adapting NV-DINOv2 with just 100K images, which also outperforms the general NV-DINOv2 model.

Plot comparing accuracy convergence over the duration of the training (in epochs) for a when starting from a generic NV-DINOv2 (in green), domain adapted NV-DINOv2 with 100k images (in blue)
and a domain adapted NV-DINOv2 with 700k images (in orange). — *Figure 7. Accuracy comparison between different NV-DINOv2 models* for classification

This example illustrates how leveraging self-supervised learning on unlabeled domain data using NVIDIA TAO with NV-DINOv2 can yield robust, accurate PCB defect inspection while reducing reliance on large amounts of labeled samples.

How to optimize vision foundation models for better throughput

Optimization is an important step in deploying deep learning models. Many generative AI and vision foundation models could have hundred million parameters which make them compute hungry and too big for most edge devices that are used in real-time applications such as industrial visual inspection or real-time traffic monitoring systems.

NVIDIA TAO leverages knowledge from these larger foundation models and optimizes them into smaller model sizes using a technique called knowledge distillation. Knowledge distillation compresses large, highly-accurate teacher models into smaller, faster student models, often without losing accuracy. This process works by having the student mimic not just the final predictions, but also the internal feature representations and decision boundaries of the teacher, making deployment practical on resource-constrained hardware and enabling scalable model optimization.

NVIDIA TAO takes knowledge distillation further with its robust support for different forms, including backbone, logit, and spatial/feature distillation. A standout feature in TAO is its single-stage distillation approach, designed specifically for object detection. With this streamlined process, a student model—often much smaller and faster—learns both backbone representations and task-specific predictions directly from the teacher in one unified training phase. This enables dramatic reductions in inference latency and model size, without sacrificing accuracy.

Applying single-stage distillation for a real-time PCB defect detection model

The effectiveness of distillation using TAO was evaluated on a PCB defect detection dataset comprising 9,602 training images and 1,066 test images, covering six challenging defect classes: missing hole, mouse bite, open circuit, short, spur, and spurious copper. Two distinct teacher model candidates were used to evaluate the distiller. The experiments were performed with backbones that were initialized from the ImageNet-1K pretrained weights, and results were measured based on the standard COCO mean Average Precision (mAP) for object detection.

Flow diagram with icons labeled (clockwise from bottom center) Data, Teacher Model, Knowledge, and Student Model. — *Figure 8. Use NVIDIA TAO to distill knowledge from a larger teacher model into a smaller student model*

In our first set of experiments, we ran the same distillation experiments using the ResNet series of backbones in the teacher-student combination, where the accuracy of student models not only matches but can even exceed their teacher model’s accuracy.

The baseline experiments are run as train actions associated with the RT-DETR model in TAO. The following snippet shows a minimum viable experiment spec file that you can use to run a training experiment.

model:
  backbone: resnet_50
  train_backbone: true
  num_queries: 300
  num_classes: 7

train:
  num_gpus: 1
  epochs: 72
  batch_size: 4
  optim:
    lr: 1e-4
    lr_backbone: 1.0e-05

dataset:
  train_data_sources:
    - image_dir: /path/to/dataset/images/train
      json_file: /path/to/dataset/annotations/train.json
  val_data_sources:
    image_dir: /path/to/dataset/images/val
    json_file: /path/to/dataset/annotations/val.json
  test_data_sources:
    image_dir: /path/to/dataset/images/test
    json_file: /path/to/dataset/annotations/test.json
  batch_size: 4
  remap_coco_categories: false
  augmentation:
    multiscales: [640]
    train_spatial_size: [640, 640]
    eval_spatial_size: [640, 640]

To run train, use the following command:

tao model rtdetr train -e /path/to/experiment/spec.yaml results_dir=/path/to/results/dir model.backbone=backbone_name model.pretrained_backbone_path=/path/to/the/pretrained/model.pth

You can change the backbone by overriding the model.backbone parameter to the name of the backbone and model.pretrained_backbone_path to the path to the pretrained checkpoint file for the backbone.

A distillation experiment is run as a distill action associated with the RT-DETR model in TAO. To configure the distill experiment, you can add the following config element to the original train experiment spec file.

distill:
  teacher:
    backbone: resnet_50
  pretrained_teacher_model_path: /path/to/the/teacher/checkpoint.pth

Run distillation using the following sample command:

tao model rtdetr distill -e /path/to/experiment/spec/yaml results_dir=/path/to/results/dir model.backbone=backbone_namemodel.pretrained_backbone_path=/path/to/pretrained/backbone/checkpoint.pth distill.teacher.backbone=teacher_backbone_name distill.pretrained_teacher_model_path=/path/to/the/teacher/model.pth

Graph showing a ResNet50 teacher model distilled into a lighter ResNet18 student model, achieving a 5% accuracy gain. — *Figure 9. Distilling a ResNet50 model into a lighter ResNet18 model yields a 5% accuracy gain*

While deploying a model on edge, both inference acceleration and memory limit could be of significant consideration. TAO enables distilling detection features not just within the same family of backbones, but also across backbone families.

Graph showing a ConvNeXt teacher model distilled into a lighter ResNet34-based student model, achieving a 3% accuracy gain. — *Figure 10. Distilling a ConvNeXt model into a lighter ResNet34-based model yields a 3% accuracy gain*

In this example, we used a ConvNeXt based RT-DETR model as the teacher and distilled it to a lighter ResNet34-based model. Through single-stage distillation, TAO improved accuracy by 3%, reducing the model size by 81% for higher throughput, low-latency inference.

How to package and deploy models with DeepStream 8 Inference Builder

Now with a trained and distilled RT-DETR model from TAO, the next step is to deploy it as an inference microservice. The new NVIDIA DeepStream 8 Inference Builder is a low‑code tool that turns model ideas into standalone applications or deployable microservices.

To use the Inference Builder, provide a YAML configuration, a Dockerfile and an optional OpenAPI definition. The Inference Builder then generates Python code that connects the data loading, GPU‑accelerated preprocessing, inference, and post‑processing stages, and can expose REST endpoints for microservice deployments.

It is designed to automate the generation of inference service code, API layers, and deployment artifacts from a user-provided model and configuration files. This eliminates the need for manual development of boilerplate code pertaining to servers, request handling, and data flow, as a simple configuration suffices for Inference Builder to manage these complexities.

Video 1. Learn how to deploy AI models using the NVIDIA DeepStream Inference Builder

Step 1: Define the configuration

Create a config.yaml file to delineate your model and inference pipeline
(Optional) Incorporate an openapi.yaml file if explicit API schema definition is desired

Step 2: Execute the DeepStream Inference Builder

Submit the configuration to Inference Builder
This utility leverages inference templates, server templates, and utilities (codec, for example) to autonomously generate project code
The output constitutes a comprehensive package, encompassing inference logic, server code, and auxiliary utilities
Output infer.tgz, a packaged inference service

Step 3: Examine the generated code

The package expands into a meticulously organized project, featuring:

Configuration: config/
Server logic: server/
Inference library: lib/
Utilities: asset manager, codec, responders, and so on

Step 4: Construct a Docker image

Use the reference Dockerfile to containerize the service
Execute docker build -t my-infer-service

Step 5: Deploy with Docker Compose

Initiate the service using Docker Compose: docker-compose up
The service will subsequently load your models within the container

Step 6: Serve to users

Your inference microservice is now operational
End users or applications can dispatch requests to the exposed API endpoints and receive predictions directly from your model

To learn more about the NVIDIA DeepStream Inference Builder, visit NVIDIA-AI-IOT/deepstream_tools on GitHub.

Additional applications for real-time visual inspection

In addition to identifying PCB defects you can also apply TAO and DeepStream to spot anomalies in industries such as automotive and logistics. To read about a specific use case, see Slash Manufacturing AI Deployment Time with Synthetic Data and NVIDIA TAO.

Get started building a real-time visual inspection pipeline

With NVIDIA DeepStream and NVIDIA TAO, developers are pushing the boundaries of what’s possible in vision AI—from rapid prototyping to large-scale deployment.

DeepStream 8.0 equips developers with powerful tools like the Inference Builder to streamline pipeline creation and improve tracking accuracy across complex environments. TAO 6 unlocks the potential of foundation models through domain adaptation, self-supervised fine-tuning, and knowledge distillation.

This translates into faster iteration cycles, better use of unlabeled data, and production-ready inference services.

Ready to get started?

Download NVIDIA TAO 6 and explore the latest features. Ask questions and join the conversation in the NVIDIA TAO Developer Forum.

Download NVIDIA DeepStream 8 and explore the latest features. Ask questions and join the conversation in the NVIDIA DeepStream Developer Forum.