NVIDIA Research: Fast Uncertainty Quantification for Deep Object Pose Estimation

Researchers from NVIDIA, University of Texas at Austin and Caltech developed a simple, efficient, and plug-and-play uncertainty quantification method for the 6-DoF (degrees of freedom) object pose estimation task, using an ensemble of K pre-trained estimators with different architectures and/or training data sources.

The researchers presented their paper “Fast Uncertainty Quantification (“FastUQ”) for Deep Object Pose Estimation” at the 2021 International Conference on Robotics and Automation (ICRA 2021).

FastUQ focuses on the uncertainty quantification for deep object pose estimation. In deep learning-based object pose estimation (see NVIDIA DOPE), a big challenge is deep-learning-based pose estimators might be overconfident in their pose predictions.

For example, the two figures below are the pose estimation results for the “Ketchup” object from a DOPE model in a manipulation task. Both results are very confident, but the left one is incorrect.

Another challenge addressed is the sim2real gap. Typically, deep learning-based pose estimators are trained from synthetic datasets (by NVIDIA ray tracing renderer, NViSII), but we want to apply these estimators in the real world and quantify the uncertainty. For example, the left figure is from the synthetic NViSII dataset, and the right one is from the real world.

In this project, we propose an ensemble-based method for the fast uncertainty quantification of deep learning-based pose estimators. The idea is demonstrated in the following two figures, where in the left one the deep models in the ensemble disagree with each other, which implies more uncertainty; and in the right one these models agree with each other, which reflects less uncertainty. 

This research is interdisciplinary and was solved by the joint efforts of different research teams at NVIDIA:

  • The AI Algorithms team led by Anima Anandkumar, and the NVIDIA AI Robotics Research Lab in Seattle working on the uncertainty quantification methods
  • The Learning and Perception Research team led by Jan Kautz for training the deep object pose estimation models, and providing photorealistic synthetic data from NVIDIA’s ray-tracing renderer, NViSII

For training the deep estimators and generating the high-fidelity photorealistic synthetic datasets, the team used NVIDIA V100 GPUs and NVIDIA OptiX (C++/CUDA back-end) for acceleration.

FastUQ is a novel fast uncertainty quantification method for deep object pose estimation, which is efficient, plug-and-play, and supports a general class of pose estimation tasks. This research has potentially significant impacts in autonomous driving and general autonomy, including more robust and safe perception, and uncertainty-aware control and planning.

To learn more about the research, visit the FastUQ project website.

Thank you to Guanya Shi at Caltech for his help with the figures and text of this blog post.

Discuss (2)