How to Scale Data Generation for Physical AI with the NVIDIA Cosmos Cookbook

Building powerful physical AI models requires diverse, controllable, and physically-grounded data at scale. Collecting large-scale, diverse real-world datasets for training can be expensive, time-intensive, and dangerous. NVIDIA Cosmos open world foundation models (WFMs) address these challenges by enabling scalable, high-fidelity synthetic data generation for physical AI and the augmentation of existing datasets.

The NVIDIA Cosmos Cookbook is a comprehensive guide for using Cosmos WFMs and tools. It includes step-by-step recipes for inference, curation, post-training, and evaluation.

For scalable data-generation workflows, the Cookbook includes a variety of recipes based on NVIDIA Cosmos Transfer, a world-to-world style transfer model. In this blog, we’ll sample Cosmos Transfer recipes to change video backgrounds, add new environmental conditions to driving data, and generate data for multiple use cases such as robotics navigation and urban traffic scenarios.

Augmenting video data

To scale existing, real datasets, developers often look to generate realistic variations of the same scene by modifying backgrounds, lighting, or object properties without breaking temporal consistency.

The Multi-Control Recipes section in the Cookbook demonstrates how to use those various control modalities to perform guided video augmentations using Cosmos Transfer. Additionally, core concepts explain how the strategic combination of different control modalities is required for achieving high-fidelity, structurally consistent video results. Developers can use depth, edge, segmentation, and vis controls—along with text prompts—to precisely tweak video attributes such as background, lighting, object geometry, color, or texture while maintaining the temporal and spatial consistency of specified regions.

This recipe is especially valuable for robotics developers, where recognizing human gestures (e.g., waving or greeting) across different environments and conditions is costly and time-consuming to capture.

Control modalities

Depth: Maintains 3D realism and spatial consistency by respecting distance and perspective.
Segmentation: Used to completely transform objects, people, or backgrounds.
Edge: Preserves the original structure, shape, and layout of the video.
Vis: By default, applies a smoothing/blur effect, where the underlying visual characteristics remain unchanged.

Technical overview

Control fusion: Combines multiple conditioning signals (edge, seg, vis) to balance geometric preservation and photorealistic synthesis.
Mask-aware editing: Binary or inverted masks define editable regions, ensuring localized transformations.
Parameterization: Each modality’s influence is tuned via control_weight in JSON configs, enabling reproducible control across editing tasks.

Core recipes

1. Background change: Replace with realistic backgrounds using filtered_edge, seg (mask_inverted), and vis to preserve subject motion.

A GIF of a person waving as the background changes to a deep blue ocean using Cosmos Transfer — *Figure 1. Background change using Cosmos Transfer*

2. Lighting change: Modify illumination conditions (e.g., day to night, indoor to outdoor) using edge + vis.

GIF of a person waving as the lighting changes using Cosmos Transfer — *Figure 2. Lighting change using Cosmos Transfer*

3. Color/texture change: Alter surface appearance with pure edge control for stable structure retention. This preserves all other structures as defined by object edges.

A GIF of a person waving as his black t-shirt color changes to red using Cosmos Transfer — *Figure 3. Color and texture change using Cosmos Transfer*

4. Object change: Transform object class or shape using low-weight edge, high-weight seg (mask), and moderate vis.

A GIF of a humanoid sorting fruits and vegetables in a lab, with some items changing into packaged food using Cosmos Transfer. — *Figure 4. Object change using Cosmos Transfer*

Example commands

Get started with Cosmos Transfer 2.5 here. You can find the configurations for the all core recipes used in this tutorial here.

Generating new environments for autonomous driving development

A GIF of nine different car-and-city scenes across weather and lighting, showing domain adaptation and synthetic data augmentation for autonomous driving with Cosmos Transfer. — *Figure 5. Cosmos Transfer output showcasing domain adaptation and synthetic data augmentation in autonomous driving use cases*

This recipe collection demonstrates how Cosmos Transfer can be used for domain adaptation and synthetic data augmentation in autonomous vehicle (AV) research. By transforming real-world or simulated driving videos across diverse environmental conditions, developers can create rich datasets for training more robust perception or planning models.

Technical overview

Multi-control inference: The pipeline combines four control modalities—depth, edge, seg, and vis—each with tunable control_weight parameters to balance realism, structure, and semantic fidelity.
Prompt-conditioned generation: Text prompts define conditions such as “night with bright street lamps,” “winter with heavy snow,” or “sunset with reflective roads.”

Example command for base parameters

{
    // Update the paramater values for control weights, seed, guidance in below json file
    "seed": 5000,
    "prompt_path": "assets/prompt_av.json",           // Update the prompt in the json file accordingly
    "video_path": "assets/av_car_input.mp4",
    "guidance": 3,
    "depth": {
        "control_weight": 0.4
    },
    "edge": {
        "control_weight": 0.1
    },
    "seg": {
        "control_weight": 0.5
    },
    "vis": {
        "control_weight": 0.1
    }
}

More example commands for this workflow can be found here.

Making robots more mobile with Sim2Real data augmentation

Three GIFs of a warehouse scene with RGB image and segmentation mask on top, and a photorealistic Cosmos rendering below. — *Figure 6. A GIF showing the input RGB video and segmentation mask (top) and the photorealistic output from Cosmos Transfer 1 (bottom).*

Robotics navigation models often struggle to generalize from simulation to reality due to visual and physical domain gaps. The Sim2Real Data Augmentation recipe demonstrates how Cosmos Transfer improves Sim2Real performance for mobile robots by generating photorealistic, domain-adapted data from simulation.

Technical overview
The pipeline integrates with NVIDIA X-Mobility and Mobility Gen:

Mobility Gen: Built on Isaac Sim, it generates high-fidelity datasets with RGB, depth, and segmentation ground truth for wheeled and legged robots.
X-Mobility: Learns navigation policies from both on-policy and off-policy data.
Cosmos Transfer: Applies multimodal controls (edge: 0.3, seg: 1.0) to vary lighting, materials, and textures while preserving geometry, motion, and annotations.

Side-by-side images of a mobile robot navigating a taped path; with Cosmos, it detects and avoids a transparent bin that the baseline system does not. — Figure 7. Visual showing how Cosmos-augmented data successfully identifies the transparent obstacle and navigates around it, demonstrating enhanced perception capabilities for challenging transparent objects.

Example command to prepare inputs for Cosmos Transfer

uv run scripts/examples/transfer1/inference-x-mobility/xmob_dataset_to_videos.py data/x_mobility_isaac_sim_nav2_100k data/x_mobility_isaac_sim_nav2_100k_input_videos
uv run scripts/examples/transfer1/inference-x-mobility/xmob_dataset_to_videos.py data/x_mobility_isaac_sim_random_160k data/x_mobility_isaac_sim_random_160k_input_videos

More example commands for this workflow can be found here.

Generating synthetic data for smart city applications

A reference architecture diagram showing synthetic data generation pipeline for smart city — *Figure 8. Synthetic data generation pipeline for smart city*

Also included in the cookbook is an end-to-end workflow that generates photorealistic synthetic data for urban traffic scenarios, accelerating the development of perception and vision-language models (VLMs) for smart city applications. The workflow simulates dynamic city traffic scenes in CARLA and is then processed through Cosmos Transfer to produce high-quality, visually authentic videos and annotated datasets.

Daytime synthetic video of a busy city intersection with multiple lanes of cars and a few pedestrians waiting at crosswalks, captured from an elevated camera angle as vehicles move through green traffic lights under clear skies. — *Figure 9. Synthetic video of a busy traffic intersection during daytime*

Access the synthetic data generation workflow here.

In synthetic data generation, assessing the quality of generated content is essential to ensure realistic and reliable results. Read this case study that demonstrates how Cosmos Reason, a reasoning vision language model, can be used to assess physical plausibility—evaluating whether the interactions and movements in synthetic videos align with the fundamental laws and constraints of real-world physics.

How to use and contribute your own synthetic data generation recipe

To use the Cosmos Cookbook, start by exploring the inferencing or post-training recipes, which provide step-by-step instructions for tasks like video generation, sim-to-real augmentation, or model training. Each recipe outlines a workflow and points you to the relevant executable scripts in the scripts/ directory.

For deeper background on topics such as control modalities, data curation, or evaluation, see the concepts guides. All recipes include setup requirements and command examples to help you reproduce or adapt results.

As an open source community platform, the Cosmos Cookbook brings together NVIDIA engineers, researchers, and developers to share practical techniques and extend the ecosystem through collaboration. Contributors are welcome to add new recipes, refine workflows, and share insights to advance post-training and deployment best practices for Cosmos models. Follow the below steps for contributing to the main Cookbook repository.

1. Fork and set up

Fork the Cosmos Cookbook repository, then clone and configure:

git clone https://github.com/YOUR-USERNAME/cosmos-cookbook.git
cd cosmos-cookbook
git remote add upstream https://github.com/nvidia-cosmos/cosmos-cookbook.git
# Install dependencies
just install
# Verify setup

just serve-internal  # Visit http://localhost:8000

2. Create a branch

git checkout -b recipe/descriptive-name  # or docs/, fix/, etc.

3. Make changes

Add your content following the templates below, then test: just serve-internal # Preview changes

just test           # Run validation

4. Commit and push

git add .
git commit -m "Add Transfer weather augmentation recipe"

git push origin recipe/descriptive-name

5. Create pull request

Create a pull request and submit PR for review

6. Address feedback

Update your branch based on review comments:

git add .
git commit -m "Address review feedback"

git push origin recipe/descriptive-name

The PR updates automatically. Once approved, the team will merge your contribution.

7. Sync your fork

Before starting new work:

git checkout main
git fetch upstream
git merge upstream/main

git push origin main

More details on templates and guidelines can be found here

Get started

Explore more recipes with the Cosmos Cookbook for your own use cases.

The Cosmos Cookbook is designed to create a dedicated space where the Cosmos team and community can openly share and contribute practical knowledge. We’d love to receive your patches and contributions to help build this valuable resource together. Learn more about how to contribute.

Learn more about NVIDIA Research at NeurIPS.

At the forefront of AI innovation, NVIDIA Research continues to push the boundaries of technology in machine learning, self-driving cars, robotics, graphics, simulation, and more. Explore the cutting-edge breakthroughs now.

Stay up to date by subscribing to NVIDIA news, following NVIDIA AI on LinkedIn, Instagram, X and Facebook, and joining the NVIDIA Cosmos forum.

How to Scale Data Generation for Physical AI with the NVIDIA Cosmos Cookbook

Augmenting video data

Generating new environments for autonomous driving development

Making robots more mobile with Sim2Real data augmentation

Generating synthetic data for smart city applications

How to use and contribute your own synthetic data generation recipe

1. Fork and set up

2. Create a branch

3. Make changes

4. Commit and push

5. Create pull request

6. Address feedback

7. Sync your fork

Get started

Tags

About the Authors

How to Scale Data Generation for Physical AI with the NVIDIA Cosmos Cookbook

Augmenting video data

Generating new environments for autonomous driving development

Making robots more mobile with Sim2Real data augmentation

Generating synthetic data for smart city applications

How to use and contribute your own synthetic data generation recipe

1. Fork and set up

2. Create a branch

3. Make changes

4. Commit and push

5. Create pull request

6. Address feedback

7. Sync your fork

Get started

Tags

About the Authors

Comments

Related posts

R²D²: Boost Robot Training with World Foundation Models and Workflows from NVIDIA Research

Develop Custom Physical AI Foundation Models with NVIDIA Cosmos Predict-2

Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models

Advancing Physical AI with NVIDIA Cosmos World Foundation Model Platform

Accelerate Custom Video Foundation Model Pipelines with New NVIDIA NeMo Framework Capabilities

Related posts

Build and Orchestrate End-to-End SDG Workflows with NVIDIA Isaac Sim and NVIDIA OSMO

How to Build Privacy-Preserving Evaluation Benchmarks with Synthetic Data

NVIDIA Kaggle Grandmasters Win Artificial General Intelligence Competition

Accelerating AV Simulation with Neural Reconstruction and World Foundation Models

Streamline Robot Learning with Whole-Body Control and Enhanced Teleoperation in NVIDIA Isaac Lab 2.3