Simulation / Modeling / Design

How to Build a Generative AI-Enabled Synthetic Data Pipeline with OpenUSD

Training physical AI models used to power autonomous machines, such as robots and autonomous vehicles, requires huge amounts of data. Acquiring large sets of diverse training data can be difficult, time-consuming, and expensive. Data is often limited due to privacy restrictions or concerns, or simply may not exist for novel use cases. In addition, the available data may not apply to the full range of potential situations, limiting the model’s ability to accurately predict and respond to diverse scenarios.

Synthetic data, generated from digital twins through computer simulations, offers an alternative to real-world data, enabling developers to bootstrap physical AI model training. You can quickly generate large, diverse datasets by varying many different parameters such as layout, asset placement,  location, color, object size, and lighting conditions. This data can then be used to aid in the creation of a generalized model. 

Achieving photorealism is critical to reducing the sim-to-real domain gap. This process aims to represent every object in the virtual environment with the correct attributes, such as materials and textures, to accurately mimic their real-world representation. Without the help of AI, this is a manual, time-consuming process. Generative AI can help speed up many aspects of the process, from asset creation to code generation, supporting developers to build robust and diverse training datasets. 

This post explains how you can build custom synthetic data generation (SDG) pipelines using NVIDIA NIM microservices for USD with NVIDIA Omniverse Replicator. NVIDIA NIM is a set of accelerated inference microservices that allow organizations to run AI models on NVIDIA GPUs anywhere—in the cloud, data center, workstations, and PCs. Omniverse Replicator is an SDK built on Universal Scene Description (OpenUSD) and NVIDIA RTX

The post will also show you how to take the generated images and further augment them using a sample ComfyUI workflow as part of a reference pipeline. The images generated can then be used with pretrained models and tools such as TAO, PyTorch, or TensorFlow.

Reference workflow overview

The workflow starts with a 3D scene of a pre-existing warehouse that contains all the necessary 3D assets such as shelves, boxes, pallets, and more. To learn more about creating the digital twin of the factory, see the workflow example. To further augment the 3D scene, 3D NIM microservices can be used to add more assets, and change 360 HDRI background images for additional randomizations. 

A reference architecture for training robot foundation models with Isaac Sim from scene generation to 3D domain randomization to simulation, to 2D domain randomization and training.
Figure 1. The end-to-end reference workflow for training robot foundation models with NVIDIA Isaac Sim

The next step involves generating the code needed for domain randomization using USD Code NIM, a state-of-the art large language model (LLM) that answers OpenUSD knowledge queries and generates USD-Python code. Domain randomization is a powerful and necessary step in creating synthetic data. Many of the parameters in the scene can be changed in a programmatic way inside Omniverse Replicator.

Once the annotated synthetic data is generated, we’ll perform generative AI augmentation to diversify and expand the data set even more using simple prompts. 

Scene generation with USD NIM microservices 

USD Code enables you to learn and develop with OpenUSD more productively in existing 3D development workflows, simply by typing in prompts and getting a return response. USD Code can answer your questions as well as write custom OpenUSD Python code. 

You can try USD Code in the NVIDIA NIM API catalog, or through Visual Studio Code, and use it in Omniverse Kit as a sample extension to modify the OpenUSD stage. For details, see the USD Code NIM documentation.

See the example prompt below for generating code:

Write a Python function that, given a stage, adds rect lights in a 6×5 grid on the x and y axis with the dimensions of x:5 and y:10. Each rect light should have an intensity of 5000 and exposure of 1 with size of (10,3). The lights should be aligned on the z axis and be nested under an xform called Lights.

The resulting code is shown below: 

from pxr import Sdf, Usd, UsdGeom, UsdLux

def create_rect_lights(stage: Usd.Stage) -> None:
    """Create a 6x5 grid of rect lights on the x and y axis."""
    lights_xform = UsdGeom.Xform.Define(stage, Sdf.Path("/Lights"))
    
    for i in range(6):
        for j in range(5):
            light_path = Sdf.Path(f"/Lights/Light_{i}_{j}")
            rect_light = UsdLux.RectLight.Define(stage, light_path)
            rect_light.CreateIntensityAttr().Set(5000)
            rect_light.CreateExposureAttr().Set(1)
            rect_light.CreateSizeAttr().Set((10, 3))
            rect_light.AddTranslateOp().Set((i * 5, j * 10, 0))
            lights_xform.AddChild(rect_light.GetPrim())

The resulting code, when executed, creates and populates 30 new lights in the warehouse with the prescribed spacing (Figure 2). This process can be used to add more assets such as shelves, boxes, and forklifts to complete the scene.

Screenshot of a warehouse with lights (left) and code (right).
Figure 2. A warehouse scene generated from code using USD Code NIM 

If you need additional assets or backgrounds to enhance the scene, you can also use services built using NVIDIA Edify, a powerful multimodal architecture for building AI models for generating visual content, such as 4K images, detailed 3D meshes, 16K 360 HDRi, PBR materials, and video. The AI models are then optimized and packaged for maximum performance with NVIDIA NIM. This speeds up the content creation process. 

With NVIDIA Edify-powered Generative 3D from Shutterstock, you can generate a mesh preview in under 10 seconds with a text prompt or reference image. You can then generate a ready-to-edit mesh with a PBR material in a few minutes, enabling rapid set dressing, concepting, or prototyping. In addition, 360 HDRi generation, also powered by NVIDIA Edify, enables users to produce 16K 360 HDRi using text or image prompts to generate backgrounds and match lighting of 3D scenes.

Shutterstock Generative 3D APIs are in commercial beta and can be accessed through TurboSquid by Shutterstock.

In addition, fVDB is an open-source deep-learning framework that can be used to generate large-scale scenes for training spatial intelligence using real-world 3D data. It builds AI operators on top of OpenVDB to create high-fidelity virtual representations of real-world environments, including neural radiance fields (NeRFs), surface reconstruction from point clouds, and even large-scale generative AI. These rich 3D datasets are AI-ready for efficient model training and inference. To learn more, see Building Spatial Intelligence from Real-World 3D Data Using Deep-Learning Framework fVDB.

Generate domain randomization code

Domain randomization is an important technique for adding diversity to a dataset. It’s one of the core functionalities of Omniverse Replicator. You can programmatically change any number of variables in a given scene, including lighting, object location, materials and textures, and more. Creating a diverse data set helps the perception model to perform in many different scenarios. 

However, writing code for large-scale domain randomization can be tedious and slow down the iterative process of data generation. The solution? Leverage the power of USD Code NIM to act as a copilot. 

This section walks you through how to use USD Code NIM to generate code as a starting point for domain randomization. You can choose to either activate USD Code in Visual Studio Code or directly in any Omniverse Kit-based application where the ChatUSD agent extension can open the USD Code window for entering the prompts.

A screenshot of NVIDIA Omniverse launcher with the ChatUSD bundle opened and toggled to activate.
Figure 3. The ChatUSD agent extension can be activated in Omniverse

To begin, type in the following prompt: 

In the open stage, I’d like to have a function to randomly move the light named “/Root/RectLight_03” between 0 and -20 meters on only the x axis.

import omni.replicator.core as rep
from omni.replicator.core.distribution import uniform
import asyncio
import omni.kit

stage = omni.usd.get_context().get_stage()
camera = '/OmniverseKit_Persp'
render_product = rep.create.render_product(camera, (1024, 1024))

def move_light() -> None:
    pass

# Initialize and attach writer
writer = rep.writers.get("BasicWriter")
writer.initialize(output_dir="_output", rgb=True)
writer.attach([render_product])

async def go(num_frames=10):
    for _ in range(num_frames):
         move_light()
         await rep.orchestrator.step_async()

asyncio.ensure_future(go())”
Screenshot of output from the USD Code NIM with Python code for generating a warehouse scene with lights.
Figure 4. The initial response from the USD Code NIM for populating the warehouse with lights

You can further improve on this by constraining the Y and Z locations of the lights, checking to see if the lights have right transforms, and more. Note that, while this is an iterative process, using USD Code as a copilot will help you reach error-free code faster than writing it yourself. 

The final code will look something like this: 

import omni.replicator.core as rep
from omni.replicator.core.distribution import uniform
from pxr import Usd, UsdGeom, Gf
import asyncio
import omni.kit

stage = omni.usd.get_context().get_stage()
camera = '/OmniverseKit_Persp'
render_product = rep.create.render_product(camera, (1024, 1024))

import random

def move_light() -> None:
    """Randomly move the light named "/Root/RectLight_03" between 0 and -20 meters on only the x-axis."""
    light_prim = stage.GetPrimAtPath("/Root/RectLight_03")
    translate_attr = light_prim.GetAttribute("xformOp:translate")
    if translate_attr:
        current_translation = translate_attr.Get()
        new_x = random.uniform(-20, 0)  # random x value between -20 and 0
        new_translation = Gf.Vec3d(new_x, current_translation[1], current_translation[2])
        translate_attr.Set(new_translation)
    else:
        new_x = random.uniform(-20, 0)  # random x value between -20 and 0
        light_prim.AddAttribute("xformOp:translate", Sdf.ValueTypeNames.Float3).Set(Gf.Vec3d(new_x, 0, 0))

# Initialize and attach writer
writer = rep.writers.get("BasicWriter")
writer.initialize(output_dir="_output", rgb=True, normals=True, distance_to_image_plane=True, semantic_segmentation=True)
writer.attach([render_product])

async def go(num_frames=10):
    for _ in range(num_frames):
         move_light()
         await rep.orchestrator.step_async()

asyncio.ensure_future(go())

Executing the code in the Replicator script editor will result in the lights being turned on and off randomly across the warehouse (Figure 5). 

Screenshot of warehouse lights turning on and off (top left), with code in the Replicator script editor (bottom right).
Figure 5. As part of the scene randomization, warehouse lights turn on and off, directed by Replicator code

This sample code is just one of many examples of what you can do using USD Code NIM for domain randomization. You can continue iterating by adding randomizations to the scene and increasing the diversity of the dataset. You can also specify that your Python be written to support developing helper functions that can be easily reused for different scenarios. This increases the efficiency of future runs. 

Export annotated images

With domain randomization set up in Replicator, you can now export the first batch of annotated images. Replicator has many out-of-the-box annotators, such as 2D Bounding Boxes, semantic segmentation, depth, normals, and many more.The type of output (bounding box or segmentation, for example) will depend on the type of model or use case. The data can be exported as basic data using the BasicWriter, KITTI using the KittiWriter, or as COCO using custom writers. 

More importantly, the data generated from Replicator captures various physical interactions like rigid body dynamics (movement and collisions, for example) or how light interacts in an environment. Figure 6 shows examples of the type of annotated data that can be exported from Replicator.  

Four annotated images of a warehouse with forklift generated from Replicator, each with a different appearance. The Normals image is in purple and red/orange; the RGB image appears close to a real warehouse scene; the Depth image is in shades of gray; the Semantic Segmentation image is in neon pink, green, orange, black, and yellow.
Figure 6. Replicator can generate a variety of annotated data, including (clockwise from top left) Normals, RGB, Depth, and Semantic Segmentation

Augment the synthetic dataset with ComfyUI

Further augmenting the synthetic dataset can create new variations, such as changing the background and adding additional texture and material details, all using text prompts. A wide range of variations is not only beneficial, but narrows the appearance domain gap with the highly photorealistic results. Overall the time spent on datasets, the burden of large high-quality digital assets, and the flexibility to regenerate new variations in the augmentation stage offer big time savings.

ComfyUI is a web-based backend and GUI for building and executing pipelines using diffusion models. It’s a powerful open-source tool available for download on GitHub. It can be used with SDXL, other diffusion models or your fine-tuned model of choice. The broader community has created an ecosystem of extra techniques and features as nodes for ComfyUI that you can integrate into your graphs.

For a reference ComfyUI graph to get started, see the Generative AI for Digital Twins Guide.

Screenshot showing the ComfyUI workflow with colored boxes around the six steps to augment synthetically generated images.
Figure 7. A sample ComfyUI workflow that augments synthetically generated images using a diffusion model

From a high level, the graph can be thought of as “programming with models.” Regional prompting is used to guide the diffusion generated outputs from the dataset images. The ControlNet nodes take in an outline image created from the normals and segmentation. The ControlNets, combined with regional prompting, are the key factor in controlling variation, yet retaining the important structures for consistency in this dataset.

The augmented outputs can be seen at the far right of Figure 7, after ‘queue prompt’ is initiated. The outputs are a combination of the traditional rendered synthetic data and augmented areas. The models are often able to pick up on lighting cues and context from the broader image, and appropriately light or shadow the augmented areas. 

Details such as the type of flooring and object colors can be changed in an existing image. Four prompts are given below, with the resulting images shown in Figure 8.

Prompt 1

white tiled linoleum floor
green shiny new counterbalance forklift
wooden pallet light colored pine wood, softwood
garbage container

Prompt 2

dark cracked dirty concrete floor
yellow counterbalance forklift
wooden pallet light colored pine wood, softwood
black garbage container

Prompt 3

cracked concrete floor
white counterbalance forklift
wooden pallet light colored pine wood, softwood
garbage container

Prompt 4

green chipped linoleum floor
blue rusty counterbalance forklift
wooden pallet light colored pine wood, softwood
garbage container

Four synthetically generated images, each with a different floor and different color of forklift (green, blue, yellow, white).
Figure 8. Synthetically generated images showing different types of flooring and forklift colors

Training the model 

Although not covered explicitly in this post, the logical next step is to train a computer vision model. You can start an NVIDIA pretrained computer vision model or select your own. The pretrained model can then be fine-tuned using a training framework such as NVIDIA TAO. TAO is built on TensorFlow and PyTorch and uses transfer learning to speed up the model training process. Just as you would with real data, you’ll likely have to run through several iterations before you’re satisfied with the model KPIs. 

Given that you already have a pipeline setup, you can return to the 3D simulation environment to generate new data by changing additional parameters and augmenting them using the ComfyUI workflow. The automated pipeline reduces the time needed to generate and label new data, which is often a bottleneck when training models. 

Summary 

You can quickly and easily build custom synthetic data generation pipelines using NVIDIA NIM microservices with NVIDIA Omniverse Replicator, as explained in this post. You can further augment the generated images using ComfyUI. These generated images can then be used with pretrained models and tools such as NVIDIA TAO, PyTorch, or TensorFlow.

We’re excited to see how you use this workflow to develop your own SDG pipeline. To get started, check out the detailed end-to-end workflow

At SIGGRAPH 2024, NVIDIA CEO Jensen Huang sat down for fireside chats with Meta founder and CEO Mark Zuckerberg and WIRED Sr. Writer Lauren Goode. Watch the fireside chats and other sessions from NVIDIA at SIGGRAPH 2024 on demand. 

Stay up to date by subscribing to our newsletter, and following NVIDIA Omniverse on YouTube, Discord, and Medium.

Discuss (0)

Tags