Computer Vision / Video Analytics

Creating Custom AI Models Using NVIDIA TAO Toolkit with Azure Machine Learning

Workflow to create AI models with NVIDIA TAO Toolkit and Azure ML

A fundamental shift is currently taking place in how AI applications are built and deployed. AI applications are becoming more sophisticated and applied to broader use cases. This requires end-to-end AI lifecycle management—from data preparation, to model development and training, to deployment and management of AI apps. This approach can lower upfront costs, improve scalability, and decrease risk for customers using AI applications. 

While the cloud-native approach to app development can be appealing to developers, machine learning (ML) projects are notoriously time-intensive and cost-intensive, as they require a team with a varied skill set to build and maintain.

This post explains how you can accelerate your vision AI model development using NVIDIA TAO Toolkit and deploying it for inference with NVIDIA Triton Inference Server—all on the Azure Machine Learning (Azure ML) platform. 

NVIDIA Triton Inference Server is an open-source inference serving software that helps standardize model deployment and execution and delivers fast and scalable AI in production.

Azure ML is a cloud service for accelerating and managing the machine learning project lifecycle that enables developers to automate AI workflows, from data preparation and model training to model deployment. Developers can easily train, deploy, and manage AI models at scale with Azure ML. 

Watch the video below to see a complete walkthrough of how to fine-tune your models with NVIDIA TAO Toolkit and deploy them for inference with NVIDIA Triton Inference Server—all on Azure ML.

Video 1. Create and deploy a custom AI model with NVIDIA TAO Toolkit on Azure Machine Learning

The overall workflow comprises three main steps:

  • Install the NGC Azure ML Quick Launch Toolkit 
  • Train and optimize a pretrained object detection model
  • Deploy the optimized model on Azure ML with NVIDIA Triton Inference Server
Diagram of the overall workflow starting from running the quick launch toolkit, to training and deploying with NVIDIA Triton Inference Server.
Figure 1. Workflow for running NVIDIA TAO Toolkit on Azure ML

This section covers the steps required to install the NGC quick launch toolkit, which configures the Azure ML resources and uploads the necessary containers and models for training. The required config files are provided on the AzureML Quick Launch – TAO resource on the NVIDIA NGC Catalog.

Create a conda environment to install the Azure ML Quick Launch Toolkit to avoid any possible conflicts with existing libraries in your machine using the following code:

conda create -n azureml-ngc-tools python=3.8
conda activate azureml-ngc-tools

Install the Azure ML Quick Launch Toolkit using the following code:

pip install azureml-ngc-tools

Download resources from AzureML Quick Launch: TAO:

wget --content-disposition -O

File content

  • The azure_config.json file contains the details pertaining to the user credentials, Azure workspace, and GPU compute resources that need to be updated. Edit the azureml_user section with your Azure subscription ID, resource group, and workspace name. Next, edit the aml_compute section with GPU cluster details.
    • Recommended VMs: NCsv3, NDv2, or NC A100 v4 or ND A100 v4 series
    • OS: Ubuntu 20.04
    • To learn more about Azure VMs, refer to the Virtual Machine series. To learn how to spin up a Azure VM instance, refer to the Azure documentation.
  • The ngc_config.json file contains the content from the NGC Catalog, such as Docker containers and Jupyter notebooks, that you can upload into the Azure Machine Learning Resources.
  • Several scripts are packaged that will be used for model deployment.
  • Run the toolkit using the following code:
azureml-ngc-tools --login azure_config.json --app ngc_config.json

This will upload all the resources to the Azure ML Datastore. After the upload, a URL is generated for the Jupyter session, which enables you to interact with the session from your local web browser. You can verify that all the resources have been uploaded in the Azure ML portal. The steps to check the resources in Azure ML portal are provided in the video above. 

Train and optimize an object detection model with NVIDIA TAO Toolkit

This section covers the steps for training the model with NVIDIA TAO Toolkit on the Azure ML platform. 

Before you begin the training process, you need to run the auxiliary notebook called CopyData.ipynb. The notebook is automatically generated with azureml-ngc-tools. This copies the notebooks from the Datastore to the compute cluster. 

Screenshot of the CopyData.ipynb Jupyter notebook that user executes to copy notebooks from datastore to VM.
Figure 2. CopyData.ipynb Jupyter notebook for copying notebooks from the Datastore to the compute cluster

A new folder called tao is created with all the additional data provided. This folder contains the Jupyter notebook, along with the required configuration files for training. Navigate to the TAO_detectnet_v2.ipynb notebook under folder tao/detectnet_V2. The DetectNet_v2 is one of the many computer vision notebooks available for training.

Screenshot of TAO DetectNet_v2 notebook that is used for training the model.
Figure 3. TAO DetectNet_v2 Jupyter notebook used for training the model

Once you load up the notebook, simply execute each cell shown. For more information on this network or on how to configure hyperparameters, refer to the TAO Toolkit documentation. Some of the main steps covered in the notebook include: 

  • Setting environment variables
  • Downloading and converting training data 
  • Downloading the model from the NGC catalog
  • Training the model
  • Pruning the model to remove unwanted layers and reduce model size 
  • Retraining the pruned model to recover lost accuracy 
  • Quantize Aware Training (QAT), which changes the precision of the model to INT8, reducing model size without sacrificing accuracy 
  • Exporting the model for inference

Once the model is generated on the compute cluster, you will need to upload the model to the Azure ML workspace. To upload, run the UploadData.ipnyb notebook to copy the model to the Datastore. This model will be used for deployment.

Deploy the model using NVIDIA Triton Inference Server

Next, deploy the exported model using NVIDIA Triton Inference Server. Use the model trained in the previous step with NVIDIA TAO Toolkit and stored in the Datastore. Pull the model directly from the Datastore. 

Once the model has been uploaded, push the NVIDIA Triton container to the Azure Container Registry (ACR), create the inference end point, and test it using some sample images.

Register the model for inference

The next steps entail uploading the model for inference. Upload the trained model from the Datastore. Navigate to the Azure Machine Learning Studio to load the local model. 

1. After logging in, go to the Azure ML workspace (created earlier) using the azureml-ngc-tool script. Select ‘Models’ from the left menu, then click ‘Register’ and ‘From datastore.’

Screenshot of where to register the model for deployment from Azure ML Studio.
Figure 4. Register the model for deployment from Azure ML Studio

2. Select the ‘Triton’ model type to upload the model.

Screenshot of options to choose for model upload from Azure ML Studio.
Figure 5. Upload the model from Azure ML Studio

3. Browse and select the path: tao/detectnet _v2/model _repository 

Screenshot of options to choose when selecting model from datastore.
Figure 6. Datastore selection

4. Name the model ‘DetectNet’ and set the version to ‘1’

Screenshot of model settings during register model phase.
Figure 7. Register the model

5. Once the model has successfully uploaded, you should be able to see the directory

Screenshot of model artifacts in Azure ML model store.
Figure 8. Model artifacts in Azure ML model store

Build and upload NVIDIA Triton image to Azure Container Registry 

Next, build the NVIDIA Triton container with necessary dependencies and upload the image to Azure Container Registry (ACR). 

1. On your local machine, run the following script to create the NVIDIA Triton container with the necessary dependencies:

bash scripts/

2. Verify that the image has been created locally by executing:

docker image ls

If successful, you should see the repo named

3. Push the Docker image to ACR using the following script:

bash scripts/ <registryname>

The registryname parameter is the name of the provided Azure ML workspace default container registry. Navigate to the Workspace essential properties dashboard to find it in the Azure portal. This script will push the Docker image to ACR and tag it as ${registryname}

Screenshot to get the Container Registry name. This is where the container will be uploaded.
Figure 9. Get the Container Registry name, where the container will be uploaded

Once the script completes, navigate to ACR to see the container in the tao repository.

Create the Azure ML endpoint and deployment

On your local machine, run the following script to create an Azure ML Endpoint followed by the deployment:

bash scripts/create_endpoint <registryname>

The script will create the Azure ML Endpoint with the endpoint names provided in the endpoint_aml.yml file. It then deploys the NVIDIA Triton service on Azure using the deployment_aml.yml file. 

In this file, you can specify VM size. For this example, use the Standard_NC6s_v3 VM. These files are provided in scripts/auxiliary_files

Once the script execution is complete, you should be able to see the deployment information, REST endpoint, and authentication key on the Azure portal. The deployment information can be found by clicking into the endpoint and navigating to the Deployment Logs tab.

Screenshot of the Triton deployment endpoint with deployment name, REST endpoints and authentication key.
Figure 10. NVIDIA Triton deployment endpoint

Validate the endpoint 

You can validate the endpoint by using the REST endpoint URL and the primary key found under the Endpoints tab on the Azure portal. To query the Azure ML endpoint from the user local machine, run the following script:

bash scripts/ <REST API endpoint> <Primary key> <input image> <output path>

This script is provided in the zip file pulled from AzureML Quick Launch: TAO in the first step. 

Next, provide the REST endpoint URL from the NVIDIA Triton deployment endpoint. The endpoint is queried with the test image provided with the <input_image> option. The output image with bounding boxes is stored in <output path>


This post showed the end-to-end workflow for fine-tuning a model with NVIDIA TAO Toolkit and deploying the trained object detection model using NVIDIA Triton Inference Server, all on Azure Machine Learning. These tools abstract away the AI framework complexity, enabling you to build and deploy AI applications in production without the need for AI expertise. 

Discuss (0)