AI pioneer Andrew Ng is calling for a broad shift to a more data-centric approach to machine learning (ML). He recently held the first data-centric AI competition on data quality, which many claim represents 80% of the work in AI.
“I’m optimistic that the AI community before long will take as much interest in systematically improving data as architecting models,” Ng wrote in his newsletter, The Batch.
Data-centric approach with synthetic data
Adopting a data-centric approach to model development when using synthetic data is an iterative process. Engineers evaluate trained models and identify improvements in the dataset. Then, they generate new datasets and turn on a new cycle of training. This process of generating data, training the model, evaluating the model, and generating more data is continued until the model performs as desired.
Data in each iteration is generated in simulation—as opposed to being collected in the real world—and then labeled, accelerating the speed of model training. These datasets, which can be generated at scale, are output in a format that can be directly used by the training tools. The capability eliminates performing additional steps in data preprocessing.
Parameterization of the synthetic data generation process provides the ML engineer more control of each iteration and traceability of what already exists in the dataset. This combination of dataset improvement by synthetic data, generation at scale, understanding of what exists in the dataset, and how it was generated shortens developer time to achieve results.
Introducing NVIDIA Omniverse Replicator for Isaac Sim
Consistent with the growing focus on data quality, NVIDIA is releasing the new Omniverse Replicator for Isaac Sim application, which is based on the recently announced Omniverse Replicator synthetic data-generation engine. These new capabilities in Isaac Sim enable ML engineers to build production-quality synthetic datasets to train robust deep-learning perception models. “Replicating” the inherent distribution of the model’s target domain is the key to maximizing model performance.
Omniverse Replicator for Isaac Sim advantages
- Generates datasets to achieve stochastic, controlled, and bounded distributions set as targets by the developer.
- Ensures that datasets contain targeted corner and test cases.
- Contains camera-relative field of view placement for objects, lighting, and the scene.
- Works at scale on edge- and cloud-based systems.
- Traces tools and parameters used in each dataset to drive iterative processes and support quality audits on production datasets.
Replicator demo: Avoiding forklift tines with an autonomous mobile robot
Many of the current generations of factory-deployed autonomous mobile robots (AMRs) are planar lidar-based. The lidar is sufficient to detect the presence of many objects and navigate. The forklift, which can commonly be found in many factories and warehouses, presents a unique challenge for lidar: the forklift chassis can be detected but not the tines.
This results in the robot avoiding the collision with the body of the forklift only to run over the tines, as the planar lidar cannot detect them.
One way to address this issue is to have the robot “perceive” that there is a forklift in its path and use that recognition to improve navigation to avoid the tines.
This section outlines a demonstration to show the entire workflow of using the Isaac Sim Replicator to train a DNN to solve the AMR/Forklift problem.
Forklift demo key steps
- Build a warehouse scene in Isaac Sim on Omniverse.
- Place an AMR in the warehouse and re-create the failure scenario.
- Acquire forklift models and generate synthetic data using Isaac Sim.
- Use synthetic data to train existing the pretrained model using TAO Toolkit.
- Deploy the model using DNN Inference Isaac ROS GEM.
- Test the Isaac ROS GEM in simulation.
- Deploy the Isaac GEM in the robot software stack on the NVIDIA Jetson platform.
Generating the dataset with the Omniverse Replicator for Isaac Sim
In this demo, we acquired eight different 3D forklift Universal Scene Description (USD) models to train the DNN. Isaac Sim Replicator was then used to describe the numerous parameters affecting the object (forklift): the lighting, the camera, and the scenario itself.
Special care was used in domain randomization so the deep-learning model would generalize its understanding of forklifts. By changing the color, texture, lighting, and position of the forklift relative to the camera; the yaw, pitch, and roll of the forklift; and adding additional non-forklift objects into the scene, the model can learn to generalize forklifts.
Ultimately, more than 90,000 images were generated for this demonstration. We used the new Omniverse Farm, a systems layer for multi-GPU and multi-agent simulations, to manage the GPU-compute resources that created the dataset.
These images represent the type of data diversity required to achieve robust performance.
Training and deploying the forklift detector DNN
We chose a pretrained model from NVIDIA TAO Toolkit to perform the segmentation task required to recognize forklifts. The model was pretrained to perform semantic segmentation on a person, car, and background classes. We used transfer learning to adapt this model to perform semantic segmentation on forklifts using the synthetic data generated in Isaac Sim.
The next step was to add the forklift model to DNN Inference Isaac ROS GEM. You can quickly add inference to ROS-based robotics applications like the AMR use case described in this post.
Improving performance for challenging AI-based computer vision applications requires large and diverse datasets that replicate the inherent distribution of the target domain. The new NVIDIA Omniverse Replicator for Isaac Sim is a powerful application that can generate production-quality datasets.
We demonstrate how synthetic data can be used to train DNNs running on an AMR to avoid the common accident of running over forklift tines.
There are many other scenarios where you can apply this process and use synthetic data to increase the robot’s understanding of its environment and how it should behave. Ultimately, this will lead to robots that are involved in fewer accidents and require less frequent human intervention.
The new release of NVIDIA Isaac Sim will be available in mid-November with new synthetic data generation features and numerous enhancements for developers and roboticists.
Check out a previous post that discusses Trimble’s work to leverage Isaac Sim to train the Boston Dynamics ‘Spot’ robot with synthetic data.