Computer Vision / Video Analytics

Developing an End-to-End Auto Labeling Pipeline for Autonomous Vehicle Perception

Mar 07, 2023

By Manoj C R, Edwin Jose, Gokul G Nair and Aparna M P

Discuss (0)

AI-Generated Summary

Dislike

The Tata Consultancy Services (TCS) developed an automated labeling pipeline for autonomous vehicle perception using NVIDIA DGX A100, which reduced manual labeling efforts by 65% for a global automotive component supplier.
The pipeline was optimized in three stages, with version 3 leveraging NVIDIA TensorRT 8.0.1.6 to transform deep learning models into FP16 TensorRT models, resulting in an end-to-end execution time of 1.01 seconds per frame.
The use of DGX RAID memory as an intermediate storage in version 2 of the pipeline significantly reduced the processing time by minimizing the latency associated with reading raw images from the network drive.

AI-generated content may summarize information incompletely. Verify important information. Learn more

Accurately annotated datasets are crucial for camera-based deep learning algorithms to perform autonomous vehicle perception. However, manually labeling data is a time-consuming and cost-intensive process.

We have developed an automated labeling pipeline as a part of the Tata Consultancy Services (TCS) artificial intelligence (AI)-based autonomous vehicle platform. This pipeline uses NVIDIA DGX A100 and the TCS feature-rich semi-automatic labeling tool for review and correction. This post explains the design of the labeling pipeline, how NVIDIA DGX A100 accelerates labeling, and the savings achieved by implementing an auto labeling process.

Designing an auto labeling pipeline

An auto labeling pipeline must be able to generate the following annotations from images downloaded in a network storage drive:

2D object detection with visibility attributes (such as fully visible and occluded)
3D object detection with visibility attributes (such as fully visible and occluded)
lane detection with attributes (such as lane classification and lane color)

We designed and trained customized deep neural networks (DNN) for 2D object detection, 3D object detection, and lane detection tasks. However, when testing the outputs from detectors for auto labeling purposes, we observed minor missed detections. This added more work for people performing the labeling. Additionally, assigning attributes per object or lane took a considerable amount of time.

To fix these issues, we added an effective tracking algorithm that provides track identifications to all the detections. The algorithm, coupled with an enhanced copy-by-track-ID feature in the TCS semi-automatic labeling tool, helps correct attributes or detections with the same track ID.

The correction in one frame is replicated to subsequent frames where the track ID is the same, thereby accelerating the corrections. The end-to-end pipeline consists of the following additional modules:

2D object tracker
lane tracker

Optimizing the pipeline

Due to the interdependencies between the underlying modules, all the modules cannot run in parallel. As a solution, we divided the entire pipeline execution into three stages to enable the parallel execution of modules.

The input to the pipeline is a sequence of 206 images, each having a resolution of 1920 x 1280 pixels. The timings provided in Figures 1, 2, and 3 are based on the batch processing for each module.

Initially, with the base version for the pipeline, the end-to-end execution time of a batch was 16 minutes and 40 seconds, or 4.854 seconds per frame, when deployed on an NVIDIA DGX A100 GPU.

Figure 1 shows the module-wise time profiling for the base version. This execution time includes reading the image from the network drive where the raw images are stored, processing all the pipeline modules, and saving the automated annotations to the network drive.

A time profiler analyzed the processing time, showing that the reading of raw images from the network drive is considerably high. The modules use raw images as one of the inputs, and the latency in reading images from the network drive results in huge overhead for the entire pipeline execution.

The DGX RAID memory was set up as an intermediate storage for the second version of the pipeline. In stage 0, all the raw images are read from the network drive to the DGX RAID memory (Figure 2).

The modules load the raw images from the RAID memory and all the outputs are stored in the RAID memory. After the pipeline execution, the final annotated outputs are moved to the network drive.

The total execution time of a batch of 206 images with version 2 of the pipeline was reduced to 6 minutes 21 seconds or 1.84 seconds per frame. Figure 2 shows the module-wise time profiling for version 2 of the pipeline.

Version 3 of the pipeline uses an NVIDIA NGC container with TensorFlow 2.5 and NVIDIA CUDA 11.4.1 (tensorflow:21.08-tf2-py3). PyTorch was installed to meet the dependencies of the underlying DNN modules. GPU-supported OpenCV was built from the source, on top of this base image.

The core deep learning algorithms required further acceleration to reach the desired processing time. By leveraging NVIDIA TensorRT 8.0.1.6 in the NGC Docker, the lane detection model, the 2D object detection model, and the 3D object detection model were all transformed into FP16 TensorRT models and implemented in version 3 of the pipeline.

As a result, the pipeline’s end-to-end execution time dropped to 3 minutes 30 seconds for 206 frames, or 1.01 seconds per frame. Figure 3 shows the module-wise time profiling for version 3. We achieved the results without compromising the required accuracy of the models. Figure 4 shows results from this auto labeling pipeline.

The processing time savings per module during the end-to-end execution of the pipeline is shown in Figure 5. With one NVIDIA A100 40 GB GPU in NVIDIA DGX, the deep learning algorithms were optimized to reduce overhead of model loading time, resulting in greater savings to achieve the required scale-up for round-the-clock auto labeling.

Conclusion

With the compute performance of the NVIDIA DGX A100 GPU, along with TCS expertise in AI and deep learning algorithm deployment, we developed a highly efficient auto labeling pipeline for AV camera perception algorithms. Effective utilization of the DGX RAID memory and NVIDIA TensorRT reduced the processing time of the auto labeling pipeline to one-fourth of the total time.

Deploying this auto labeling pipeline for a global automotive component supplier achieved a 65% reduction in manual efforts, compared to state-of-the-art open models such as YOLOX and LaneNet, which provided just a 34% reduction.

Want to learn more? Register for NVIDIA GTC 2023 for free and join us March 20–23 for Developing Robust Multi-Task Models for AV Perception. Check out the targeted session tracks for autonomous vehicle developers, including mapping, simulation, safety, and more.

Discuss (0)

About the Authors

About Manoj C R
With 21 years of experience and primary expertise in AI and high computing devices, Manoj CR is heading the AI/Gen AI and Autonomous Driving Center of Excellence in TCS. He is primarily responsible for defining futuristic solutions in AI and generative AI for next-generation software defined vehicles and leveraging them for vehicle development for global OEMs and Tier 1 Suppliers. He owns 21 patents in AI and computer vision and has published numerous conference papers and journals.

View all posts by Manoj C R

About Edwin Jose
With 10 years of work experience, Edwin Jose has been working as a Technical Architect in AI and Machine Learning at the Autonomous Vehicles Center of Excellence at TCS. He holds a master’s degree in Automotive Electronics from Amrita Vishwa Vidyapeetham. His areas of interest include the design and development of camera-based perception algorithms for automated labeling and vehicle deployment.

View all posts by Edwin Jose

About Gokul G Nair
Gokul G Nair is an AI developer in TATA Consultancy Services working on the Center of Excellence – ADAS team. He holds a master’s degree in Automotive Electronics from Amrita Vishwa Vidyapeetham. His areas of interest include computer vision and deep learning. He is responsible for the design and development of the computer vision algorithms and deployment of algorithms in pipelines running on DGX as well as real-time deployment of the algorithms.

View all posts by Gokul G Nair

About Aparna M P
Aparna M P is a senior AI developer from TATA Consultancy Services working for the Center of Excellence – ADAS team. She holds a master’s degree with a gold medal in Automotive Electronics from Amrita Vishwa Vidyapeetham. Her interests concentrate on the design of computer vision and deep learning algorithms for automotive applications. She has published multiple research papers on the AI perception domain and has been a speaker at ML symposiums.

View all posts by Aparna M P