1. Introduction
AI is bringing about a revolutionary change across many industries, from retail, manufacturing, and healthcare to automotive and more. Enterprises in these industries are operationalizing AI to perceive the world around us and generate real-time insights that would otherwise be impossible.
Moving AI from research to production by building a custom AI solution for a given use case is a nontrivial task. It starts with collecting and annotating large sets of representative data required for training. Achieving a state-of-the-art deep learning model requires considerable domain experience, where data scientists run many iterations and experiments to arrive at the representative model. This is extremely time-consuming. Finally, the trained model must be optimized for high-throughput and low-latency inference.
Speed up training with transfer learning
To fast-track AI from concept to production, the most practical and scalable way is to fine-tune existing pretrained AI models with custom data. This helps to address the proliferation and diversity of use cases across many industries and enables rapid prototyping and customization to meet requirements for any environment.
NVIDIA TAO Toolkit a low-code AI solution, solves these problems by enabling you to quickly train and adapt using transfer learning, and optimize it for inference with built-in NVIDIA TensorRT. Transfer learning is a training technique in which you leverage the learned features from one model to another. This reduces the amount of data and training time required to customize models to your exact needs.
The model architectures and the task-based models provided by the TAO Toolkit are state-of-the-art and proven to work by solving many common problems in computer vision, speech, and natural language understanding.
Below are some example of models for the various tasks:
- Computer vision: Object detection, classification and semantic segmentation
- Speech AI: Automatic speech recognition (ASR) and text-to-speech (TTS)
- Natural Language Understanding (NLU): Question-answering, intent and slot classification, and punctuation
In the following sections, we illustrate the power of pretrained models and the TAO Toolkit to solve several challenges faced by many industries.
Here are some use cases that are covered in this whitepaper:
- Adapting to different camera types using the PeopleNet pretrained model and thermal images
- Prototyping with small datasets using a PCB inspection example
- Adding new classes to an existing model using a helmet detection example
- Customizing a pretrained action recognition model for autonomous shopping
If you would like to jump ahead and start experimenting, use the link to github project page.