Simulation / Modeling / Design

How to Train Autonomous Mobile Robots to Detect Warehouse Pallet Jacks Using Synthetic Data

Synthetic data can play a key role when training perception AI models that are deployed on autonomous mobile robots (AMRs). This process is becoming increasingly important in manufacturing. For an example of using synthetic data to generate a pretrained model that can detect pallets in a warehouse, see Developing a Pallet Detection Model Using OpenUSD and Synthetic Data

This post explores how to train AMRs to detect warehouse pallet jacks using synthetic data. Pallet jacks are commonly used in warehouses to lift and transport heavy pallets. In a crowded warehouse, it’s important for the AMR to detect and avoid colliding with a pallet jack.

To achieve this goal, it’s necessary to train the AI model with a large and diverse set of data under varying lighting conditions and occlusions. Real data can rarely capture the full range of potential scenarios. Synthetic data generation (SDG), which is annotated data generated from a 3D simulation, enables developers to overcome the data gap and bootstrap the model training process. 

Video 1. Synthetic data generation using NVIDIA Omniverse Replicator for NVIDIA Isaac Sim

This use case will again take a data-centric approach by manipulating the data, as opposed to changing the model parameters to fit the data. The process begins by generating synthetic data using NVIDIA Omniverse Replicator in NVIDIA Isaac Sim. Next, train the model with synthetic data in NVIDIA TAO Toolkit. Finally, visualize the model’s performance on real data, and modify the parameters to generate better synthetic data to reach the desired level of performance.

Omniverse Replicator is a core extension of NVIDIA Omniverse, a computing platform that enables individuals and teams to develop workflows based on Universal Scene Description (OpenUSD). Replicator enables developers to build custom synthetic data generation pipelines to generate data to bootstrap the training of computer vision models. 

Iterating with synthetic data to improve model performance

The sections below explain how the team iterated with synthetic data to improve the real-world performance of our object detection model. It walks through the steps using Python scripts that work with the Omniverse Replicator APIs. 

For each iteration, we incrementally changed various parameters in the model and generated new sets of training data. The model’s performance was then validated against real data. We continued this process until we were able to close the sim-to-real gap. 

The process of varying object or scene parameters is called domain randomization. You can randomize many parameters, including location, color, texture, background, lighting of objects and scene, allowing you to generate new data quickly for your model training. 

OpenUSD, an extensible framework, 3D scene description, and the foundation for NVIDIA Omniverse, makes it easy to experiment with different parameters of a scene. Parameters can be modified and tested in individual layers, and users can author non-destructive overrides on top of those layers. 

Preparation

To get started with this example, you’ll need a system with NVIDIA RTX GPUs and the latest version of NVIDIA Isaac Sim installed. Isaac Sim is a scalable robotics simulation application that leverages the core functionality of Omniverse Replicator for generating synthetic data. For details on installation and configuration, see the documentation section. 

When Isaac Sim is up and running, you can then download all the assets from  NVIDIA-AI-IOT/synthetic_data_generation_training_workflow on GitHub. 

Iteration 1: Changing color and camera position

For the first iteration, the team varied the color and pose of the pallet jack, along with the pose of the camera. Follow the steps below to replicate this scenario in your own session. 

Start by loading the stage:

ENV_URL = "/Isaac/Environments/Simple_Warehouse/warehouse.usd"
open_stage(prefix_with_isaac_asset_server(ENV_URL))

Then add pallet jacks and a camera to the scene. The pallet jacks can be loaded from the SimReady Asset library. 

PALLETJACKS = ["http://omniverse-content-production.s3-us-west-2.amazonaws.com/Assets/DigitalTwin/Assets/Warehouse/Equipment/Pallet_Trucks/Scale_A/PalletTruckScale_A01_PR_NVD_01.usd",
            "http://omniverse-content-production.s3-us-west-2.amazonaws.com/Assets/DigitalTwin/Assets/Warehouse/Equipment/Pallet_Trucks/Heavy_Duty_A/HeavyDutyPalletTruck_A01_PR_NVD_01.usd",
            "http://omniverse-content-production.s3-us-west-2.amazonaws.com/Assets/DigitalTwin/Assets/Warehouse/Equipment/Pallet_Trucks/Low_Profile_A/LowProfilePalletTruck_A01_PR_NVD_01.usd"]

cam = rep.create.camera(clipping_range=(0.1, 1000000))

SimReady, or simulation-ready, assets are physically-accurate 3D objects that encompass accurate physical properties and behavior. They are preloaded with the metadata and annotation required for model training. 

Next, add domain randomization for the pallet jacks and the camera:

with cam:
            	rep.modify.pose(position=rep.distribution.uniform((-9.2, -11.8,     0.4), (7.2, 15.8, 4)),look_at=(0, 0, 0))

     # Get the Palletjack body mesh and modify its color
     with rep.get.prims(path_pattern="SteerAxles"):
          	rep.randomizer.color(colors=rep.distribution.uniform((0, 0, 0), (1, 1, 1)))

   # Randomize the pose of all the added palletjacks
   with rep_palletjack_group:
      rep.modify.pose(
   position=rep.distribution.uniform((-6, -6, 0), (6, 12, 0)),
   rotation=rep.distribution.uniform((0, 0, 0), (0, 0, 360)),
   scale=rep.distribution.uniform((0.01, 0.01, 0.01), (0.01, 0.01, 0.01)))
Composite of three synthetic images showing pallet jacks of different colors and a randomized camera position.
Figure 1. Synthetic images showing randomized color and location of the pallet jacks and randomized camera position

Finally, configure writers for annotating data:

writer = rep.WriterRegistry.get("KittiWriter") 
writer.initialize(output_dir=output_directory,
                    omit_semantic_type=True,)

Note that this example uses the KittiWriter provided with Replicator to store the annotations in KITTI format for object detection labels. This will ensure easier compatibility with training pipelines.

Results 

For this first batch of synthetic data, the team used the LOCO dataset, which is a scene understanding dataset for logistics covering the problem of detecting logistics-specific objects to visualize the real-world model performance. 

The resulting images show that the model is still trying to detect the pallet jack in a crowded warehouse (Figure 2). Many bounding boxes have been created around objects surrounding the pallet jack. This result is somewhat expected, given that it is the first training iteration. Reducing the domain gap will be a focus for subsequent iterations.

Composite of real-world images showing many false positives.One image shows two bounding boxes with one pallet jack. Another shows many more objects in addition to the pallet jack.
Figure 2. Real-world images showing many false positives after validating the model against real data

Iteration 2: Adding textures and changing ambient lighting

In this iteration, the team randomized the texture and the ambient lighting, in addition to the pallet color and camera position from the first iteration. 

Activate the randomization for both textures and lighting:

# Randomize the lighting of the scene
    with rep.get.prims(path_pattern="RectLight"):
     rep.modify.attribute("color", rep.distribution.uniform((0, 0, 0), (1, 1, 1)))
     rep.modify.attribute("intensity", rep.distribution.normal(100000.0, 600000.0))
     rep.modify.visibility(rep.distribution.choice([True, False, False, False, False, False, False]))

# select floor material
random_mat_floor = rep.create.material_omnipbr(diffuse_texture=rep.distribution.choice(textures),                                                    roughness=rep.distribution.uniform(0, 1),                                               metallic=rep.distribution.choice([0, 1]),                                                    emissive_texture=rep.distribution.choice(textures),           emissive_intensity=rep.distribution.uniform(0, 1000),)
        
        
     with rep.get.prims(path_pattern="SM_Floor"):
          rep.randomizer.materials(random_mat_floor)

Figure 3 shows the resulting synthetic images. Notice the various textures that have been added to the background, along with different types of ambient light incident on the objects.

Composite of three synthetic images showing pallet jacks with different texture backgrounds.
Figure 3. Synthetic images of pallet jacks with backgrounds of different textures

Results

This iteration shows a reduction in the number of false positives, with the addition of texture and lighting randomization. One crucial factor when generating synthetic data is to ensure a good diversity of data in the resulting dataset. Similar or repetitive data from the synthetic domain will likely not help to improve ‌real-world model performance. 

To improve the diversity of the dataset, add more objects in the scene with randomization. This is addressed in the third iteration and should help improve model robustness.

Composite of three real-world images showing red bounding boxes around pallet jacks.
Figure 4. Real-world images showing that the model detects pallet jacks with higher precision after training on randomized texture and lighting images

Iteration 3: Adding distractors

This iteration introduces additional objects, called distractors, into the scene. These distractors add more diversity to the dataset. This iteration also includes all the changes shown in the first two iterations. 

Add distractors to the scene:

DISTRACTORS_WAREHOUSE = ["/Isaac/Environments/Simple_Warehouse/Props/S_TrafficCone.usd",
                            "/Isaac/Environments/Simple_Warehouse/Props/S_WetFloorSign.usd",
                            "/Isaac/Environments/Simple_Warehouse/Props/SM_BarelPlastic_A_01.usd",
                            "/Isaac/Environments/Simple_Warehouse/Props/SM_BarelPlastic_A_02.usd",
                            "/Isaac/Environments/Simple_Warehouse/Props/SM_BarelPlastic_A_03.usd"]

# Modify the pose of all the distractors in the scene
   with rep_distractor_group:
        rep.modify.pose(
position=rep.distribution.uniform((-6, -6, 0), (6, 12, 0)),
            rotation=rep.distribution.uniform((0, 0, 0), (0, 0, 360)),
            scale=rep.distribution.uniform(1, 1.5))

Note that all the assets used in this project are available with the default Isaac Sim installation. Load them by specifying their path on the nucleus server. 

Composite of synthetic images showing pallet jacks surrounded by common warehouse objects such as a caution sign and bottles.
Figure 5. Synthetic images of pallet jacks surrounded by common warehouse objects (distractors)

Results

Figure 6 shows results from the third iteration. The model can accurately detect the pallet jacks, and there are fewer bounding boxes. The model performance has improved significantly compared to the first iteration. 

Composite of real-world images with red bounding boxes around pallet jacks.
Figure 6. Real-world images showing that the model detects pallet jacks with high precision 

Continue iterating

The team used 5,000 images to train the model for each iteration. You can continue to iterate on this workflow by generating more variations, along with increasing the size of your synthetic data, to reach the desired level of accuracy.

We used NVIDIA TAO Toolkit to train a DetectNet_v2 model with a resnet18 backbone for these experiments. Using this model is not a workflow requirement. You can leverage the data generated with the annotations to train a model of your architecture and framework choice.  

We leveraged the KITTI writer in our experiments. However, you can write your own custom writer with Omniverse Replicator to generate data in the correct annotations format. This enables seamless compatibility with your training workflows. 

You can also experiment with mixing real and synthetic data during your training process. The final model can be optimized and deployed on NVIDIA Jetson in the real world after obtaining satisfactory evaluation metrics.

Develop synthetic data pipelines with Omniverse Replicator

With Omniverse Replicator, you can build your own custom synthetic data generation pipeline or tools to programmatically generate large sets of diverse synthetic data to bootstrap your model, and iterate quickly. Introducing various types of randomizations adds the necessary diversity to the dataset, enabling the model to recognize the object or objects of interest in a variety of conditions. 

To get started with the workflow featured in this post, visit NVIDIA-AI-IOT/synthetic_data_generation_training_workflow on GitHub. To see the full workflow in action, join Rishabh Chadha of NVIDIA and Jenny Plunkett of Edge Impulse as they showcase how to use Omniverse Replicator and synthetic data to train object detection models for manufacturing processes (Video 2).

Video 2. Learn how to train computer vision models with synthetic data

To build your own custom synthetic data generation pipeline, download Omniverse free and follow the instructions for getting started with Replicator in Omniverse Code. You can also take the self-paced online course, Synthetic Data Generation for Training Computer Vision Models and watch the latest Omniverse Replicator tutorials.

NVIDIA recently released Omniverse Replicator 1.10 with new support for developers building low-code SDG workflows. For details, see Boost Synthetic Data Generation with Low-Code Workflows in NVIDIA Omniverse Replicator 1.10

NVIDIA Isaac ROS 2.0 and NVIDIA Isaac Sim 2023.1 are also now available with major updates to performant perception and high-fidelity simulation. To learn more, see Accelerate AI-Enabled Robotics with Advanced Simulation and Perception Tools on NVIDIA Isaac Platform.

Stay up to date with NVIDIA Omniverse by subscribing to the newsletter and following Omniverse on Instagram, LinkedIn, Medium, Threads, and Twitter. For more, check out our forums, Discord server, Twitch and YouTube channels.

Discuss (0)

Tags