Robotics

Build Synthetic Data Pipelines to Train Smarter Robots with NVIDIA Isaac SimĀ 

A GIF of a robot's view in a warehouse.

As robots take on increasingly dynamic mobility tasks, developers need physics-accurate simulations that scale efficiently across environments and workloads. Training robot policies and models to do these tasks requires a lot of high-quality data, which is often expensive and time-consuming to collect in the physical world. 

Synthetic data, generated using simulated environments in NVIDIA Isaac Sim, is a practical way to scale up the data generation process. 

In this blog, we explore:

  1. Creating a simulated environment by bringing in environment assets with NVIDIA Omniverse NuRec.
  2. Adding simulation-ready (SimReady) assets to a simulated scene. 
  3. Generating synthetic data in Isaac Sim or NVIDIA Isaac Lab using MobilityGen.
  4. Augmenting generated data using NVIDIA Cosmos world foundation models (WFMs).

Reconstruct 3D digital twins using Omniverse NuRec

Omniverse NuRec is a set of technologies for reconstructing and rendering 3D interactive simulations from real-world sensor data. The reconstructed environments are used across domains like robotics, AV, and industrial/geospatial for generating synthetic data, training AI models, and testing model behavior.

Isaac Sim supports NuRec Gaussian-based rendering as neural radiance fields (NeRFs), 3D Gaussian Splats (3DGS), and 3D Gaussian Unscented Transforms (3DGUT). Data is rendered in OpenUSD for simulation. You can load compatible assets and scenes in Isaac Sim and control rendering through the OmniNuRecVolumeAPI properties. Learn more about NuRec for robotics use-cases in the documentation.

Add SimReady assets to a simulated scene

SimReady assets are OpenUSD-based accurate 3D models with built-in semantic labeling, dense captions, and physics properties based on USDPhysics that streamline robot simulation setup.

The SimReady Warehouse 01 Assets Pack includes a large collection of USD models of objects like pallets, storage racks, and ramps. You can just drag-and-drop these into your scene. For robotics and related use-cases, explore the Physical AI dataset.  

The following video shows how to add SimReady assets to a scene in Isaac Sim.

Video 1. Populating warehouse scenes with physically accurate 3D objects using simple drag-and-drop functionality

In this way, we can easily create scenes with multiple objects in simulation. A major use of these simulated environments is to collect synthetic data for training robot policies, which we will learn about in the next section.  

Try the SimReady Standardization workflow to design, validate, and implement standardized 3D asset specifications in OpenUSD.

Generate synthetic data using MobilityGen

MobilityGen is a workflow for generating data for mobile robots built on Isaac Sim. It supports data collection through manual methods like keyboard and gamepad teleoperation, and through automated methods like random accelerations and random path following.Ā 

In the following example, you’ll learn how MobilityGen is used to generate data for an H1 humanoid robot in Isaac Sim. This workflow can be used for other robot embodiments, like quadrupeds and autonomous mobile robots (AMRs), and has been tested on the Spot and Carter robots.

While data from MobilityGen can train mobility policies for robots, performance improves when the data includes visual diversity. We’ll learn about augmenting data with visual diversity using NVIDIA Cosmos in the next section. 

The following outlines the steps involved in generating data using MobilityGen.

  1. Build an Occupancy Map: This is a grid-based representation of the robot’s environment where each cell represents the probability of being occupied by an obstacle.
  2. Record a trajectory: A trajectory of a mobile robot specifies position, velocity, and orientation at every instant as it moves through its environment. 
  3. Replay and render: You can replay the generated trajectories to evaluate and visualize data.

The following videos show how to generate synthetic data in Isaac Sim using MobilityGen.

Video 2. Creating occupancy maps for training mobility models across different robot embodiments
Video 3. Recording collision-free paths and capturing RGB/depth camera data from the robot’s perspective

In the following example, we use a warehouse environment available in Isaac Sim to run MobilityGen. You can create your own environment using the SimReady assets we learned about in the last section.

Steps for building an Occupancy Map

  1. Load the warehouse stage:
    1. Open Content Browser (Window > Browsers > Content).
    2. Load the warehouse USD file in Isaac Sim/Environments/Simple_Warehouse/warehouse_multiple_shelves.usd.
  2. Create the Occupancy Map
    1. Select Tools > Robotics > Occupancy Map to open the extension.
    2. In the Occupancy Map window, set Origin to:
      1. X: 2.0
      2. Y: 0.0
      3. Z: 0.0
      4. Note: To input a value in the text box, Ctrl + left click to activate the input mode.
    3. In the Occupancy Map window, set Upper Bound to:
      1. X: 10.0
      2. Y: 20.0
      3. Z: 2.0 (Assumes the robot can move under 2-meter overpasses)
    4. In the Occupancy Map window, set Lower Bound to:
      1. X: -14.0
      2. Y: -18.0
      3. Z: 0.1 (Assume the robot can move over 5cm bumps)
    5. Click Calculate to generate the occupancy map.
    6. Click Visualize Image to view the occupancy map.
    7. In the Visualization window, under Rotate Image, select 180.
    8. In the Visualization window, under Coordinate Type, select ROS Occupancy Map Parameters File YAML
    9. Click Regenerate Image.
    10. Copy the YAML text generated to your clipboard.
    11. In a text editor of choice, create a new file named ~/MobilityGenData/maps/warehouse_multiple_shelves/map.yaml.
      Please note: On windows, replace ~ with a directory of choice.
    12. Paste the YAML text copied from the Visualization window into the created file.
    13. Edit the line image: warehouse_multiple_shelves.png to read image: map.png.
    14. Save the file.
    15. Back in the Visualization window, click Save Image.
    16. In the tree explorer, open the folder ~/MobilityGenData/maps/warehouse_multiple_shelves.
    17. Under the file name, enter map.png.
    18. Click save.

Verify that you now have a folder named ~/MobilityGenData/maps/warehouse_multiple_shelves/ with a file named map.yaml and map.png inside.

Steps for recording a trajectory

After creating a map of the environment, you can generate data with MobilityGen:

  1. Enable the MobilityGen UI extension.
    1. Navigate to Window > Extensions and search for MobilityGen UI.
    2. Click the toggle switch for the MobilityGen UI extension.
    3. Note: You should see two windows appear. One window is the MobilityGen UI, the other is to display the Occupancy Map and visualizations. One window might be hiding behind the other when they first appear, so we recommend dragging them into a window pane to view both at the same time.
  2. Build the scenario:
    1. In the MobilityGen window under Stage, paste the following USD:
      http://omniverse-content-production.s3-us-west-2.amazonaws.com/Assets/Isaac/5.0/Isaac/Environments/Simple_Warehouse/warehouse_multiple_shelves.usd
    2. In the MobilityGen window, under Occupancy Map, enter the path to the map.yaml file created previously at
      ~/MobilityGenData/maps/warehouse_multiple_shelves/map.yaml
    3. Under the Robot dropdown, select H1Robot.
    4. Under the Scenario dropdown, select KeyboardTeleoperationScenario.
    5. Click Build.
      After a few seconds, verify that you see the scene and that the occupancy map appears.
  3. Test drive the robot using the following keys:
    1. W – Move forward
    2. A – Turn left
    3. S – Move backwards
    4. D – Turn right
  4. Start recording:
    1. Click Start recording to start recording a log.
    2. Move the robot around.
    3. Click Stop recording to stop recording.

The data is now recorded to ~/MobilityGenData/recordings by default.

Steps for replaying and rendering

After recording a trajectory, which includes data like robot poses, you can now replay the scenario.

  1. Use the replay_directory.py Python script that ships with Isaac Sim. To run the script, call the following from inside the Isaac Sim directory:
./python.sh standalone_examples/replicator/mobility_gen/replay_directory.py --render_interval 40 --enable isaacsim.replicator.mobility_gen.examples

After the script finishes, verify that you have a folder ~/MobilityGenData/replays, which contains the rendered sensor data. You can open this folder to explore the data.

There are examples on how to load and work with the recorded data in the open source MobilityGen GitHub Repository. We recommend visualizing your recorded data by running the Gradio Visualization Script.

Find more information, such as adding a custom robot, in the tutorial on Data Generation with MobilityGen.

Augment generated training data using NVIDIA Cosmos

After generating data using MobilityGen, use Cosmos Transfer to generate photorealistic videos from synthetic robot data. This adds visual variation to reduce the sim-to-real gap and improves policy performance after deployment.

A high-level diagram showing the SDG workflow, which includes generating synthetic data using MobilityGen and augmenting the data using Cosmos Transfer. This generates high-quality datasets for training robot models, and results in improved simulation-to-real performance.
Figure 1. The high-level SDG workflow includes generating synthetic data using MobilityGen and augmenting the data using Cosmos Transfer, which results in high-quality datasets for training robot models

Cosmos Transfer is a WFM that generates photorealistic videos from inputs of multiple video modalities like RGB, depth, and segmentation. Along with the input video, you can give a text prompt with details guiding how you want the generated video to look like. The following is an example prompt:

A realistic warehouse environment with consistent lighting, perspective, and camera motion. Preserve the original structure, object positions, and layout from the input video. Ensure the output exactly matches the segmentation video frame-by-frame in timing and content. Camera movement must follow the original path precisely.

Video 4 shows how to run Cosmos Transfer on MobilityGen data to add visual variation.

Video 4. Processing Isaac Sim synthetic data and converting warehouse scenes into realistic training datasets

Video 5. The inference process to generate photorealistic videos

As seen in the video, an example command to run inference with Cosmos Transfer is:

export CUDA_VISIBLE_DEVICES="${CUDA_VISIBLE_DEVICES:=0,1,2,3}"
export CHECKPOINT_DIR="${CHECKPOINT_DIR:=./checkpoints}"
export NUM_GPU="${NUM_GPU:=4}"
PYTHONPATH=$(pwd) torchrun --nproc_per_node=$NUM_GPU --nnodes=1 --node_rank=0 cosmos_transfer1/diffusion/inference/transfer.py \
    --checkpoint_dir $CHECKPOINT_DIR \
    --video_save_folder outputs/example1_single_control_edge \
    --controlnet_specs assets/inference_cosmos_transfer1_single_control_edge.json \
    --offload_text_encoder_model \
    --offload_guardrail_models \
    --num_gpus $NUM_GPU

During closed-loop evaluation in the lab, a policy trained on synthetic and Cosmos-augmented data consistently outperformed a policy trained on synthetic data alone. The following scenarios are handled well by the policy trained on synthetic and Cosmos-augmented data:

  • Navigating around transparent obstacles.
  • Avoiding obstacles that blend into the background, like a gray pole on a gray floor.
  • Going closer to obstacles and reducing the overall distance traveled to get to a goal position.
  • Navigating in dimly lit environments.
  • Navigating narrow passages.  

You can run Cosmos Transfer on any real or synthetic video data. Another example is the tutorial on Using Cosmos for Synthetic Dataset Augmentation. This tutorial explains how to generate synthetic data using Replicator in Isaac Sim. 

Getting started

NVIDIA provides a comprehensive collection of OpenUSD resources to accelerate your learning journey. Start with the self-paced Learn OpenUSD, Digital Twins, and Robotics training curricula that build the foundational skills covered in this guide.

For professionals ready to take the next steps in their robotics career, the OpenUSD Development certification offers a professional-level exam that validates your expertise in building, maintaining, and optimizing 3D content pipelines using OpenUSD. Get OpenUSD certified in person at NVIDIA GTC Washington, D.C., and learn more about synthetic data for robot development during the Physical AI and Robotics Day

Tune in to upcoming OpenUSD Insiders livestreams and connect with the NVIDIA Developer Community. Stay up to date by following NVIDIA Omniverse on Instagram, LinkedIn,  X, Threads, and YouTube

Discuss (0)

Tags