How can an AI system understand the difference between a plausible accident and a physically impossible event? Or plan a multi-step interaction across humans, objects, and environments in an edge-case scenario? These are questions at the core of physical intelligence—the kind that underpin how robots manipulate the world, how autonomous vehicles make split-second decisions, and how virtual agents simulate reality.
NVIDIA Cosmos Reason is a world foundation model (WFM) for physical AI—built not just to see, but to reason. Trained to understand space, time, and physics, it can critique synthetic data and build curated datasets to train embodied AI systems like robots and autonomous vehicles to act more realistically. This post covers how Cosmos Reason is developed, where it’s used, and how you can use openly available model checkpoints and scripts to run the model for physical AI tasks.
Recap: NVIDIA Cosmos world foundation models for physical AI
Cosmos is a WFM development platform. At the core are Cosmos WFMs which are pretrained, multimodal models designed to understand and generate world states as videos to replicate physical worlds to train physical AI systems.
These models learn from over 20M hours of robotics and driving data, enabling them to predict how environments change over time or adapt scenes to new conditions. With NVIDIA Cosmos Predict, developers can generate future frames from text, images, or video. With NVIDIA Cosmos Transfer, they can relight or change environments in videos to develop diverse, physics-aware training data at scale. Cosmos also provides tools to curate data, tokenize it, and post-train the models for specific robots or autonomous systems or downstream tasks.
Cosmos Reason for scalable robotics training data
First unveiled at NVIDIA GTC 2025, Cosmos Reason is now available to transform how synthetic data is generated and curated for training physical AI systems. It is an open, spatiotemporally aware reasoning model that interprets visual inputs, analyzes them in the context of a provided text prompt, runs chain-of-thought reasoning to reward responses, and generates optimal decisions or captions.

Inside Cosmos Reason
Cosmos Reason is built using supervised fine-tuning (SFT) and reinforcement learning that bridges multimodal perception and real-world decision-making:
- Physical AI SFT: Focuses on real-world reasoning. Learns object affordances (e.g., “a pan conducts heat”), action chains (multi-step plans), and spatial feasibility (e.g., “a person can’t walk through walls”) using curated physical interaction datasets.
- Reinforcement learning for embodied decisions: The long chain-of-thought reasoning capability in Cosmos Reason enables training with a small training size and generalizing to held-out test scenarios. Verifiable Physical AI rewards like “arrow-of-time” enable learning world dynamics without human annotations.
Testing Cosmos Reason on common sense
Cosmos Reason excels at understanding real-world physical situations—like how objects and people interact in dynamic environments—using both video and text. Evaluated across benchmarks like BridgeData V2, RoboVQA, and Agibot, the model shows strong common-sense reasoning and situational awareness.
Fine-tuning on physical AI tasks boosts the base vision-language model’s performance by over 10%, while reinforcement learning adds another 5% gain. On average, Cosmos Reason achieves a score of 65.7 across key benchmarks, setting a high bar for AI systems in robotics, autonomous vehicles, and embodied agents.
There’s still room for improvement: post-training on high-quality, task-specific curated data and continued reinforcement learning can further enhance performance of Cosmos Reason.
Common Sense | BridgeData V2 | RoboVQA | Agibot | HoloAssist | AV | RoboFail | Avg. |
56.2 | 73.5 | 86.8 | 54.2 | 60 | 67 | 62.0 | 65.7 |
How to use Cosmos Reason
Developers can download the model checkpoints from Hugging Face and get the inference scripts and post-training from GitHub.
The model takes a video input at a low resolution, like 604X480, along with a text prompt that specifies the developer’s intent, such as a question or explanation, guiding the model to reason and respond accordingly. Developers can also use the prompt upsampler model to improve text prompts.
Cosmos WFMs, including Cosmos Reason, are optimized for best performance on NVIDIA AI. To run the models, developers can set up a Docker environment or run it in their own environment.
For larger industrial workloads and to run vision AI pipelines, developers can use the power of NVIDIA Blackwell GB200 on NVIDIA DGX Cloud and run accelerated inference on NVIDIA Hopper H100 or NVIDIA Ampere A100 GPUs using inference scripts.
Cosmos WFMs power scalable synthetic data generation pipelines that help train robotic systems with greater efficiency, and coverage than traditional methods.
Cosmos Reason generates diverse, realistic prompts for Cosmos Predict and curates high-quality synthetic data from video using text-based controls. Together, they power workflows like NVIDIA Isaac GR00T Dreams to produce physically accurate motion data at scale.
Integrated with NVIDIA Omniverse for high-fidelity simulation, Cosmos streamlines the entire loop—from data generation to deployment—accelerating robotics development beyond the limits of real-world data.
Get started
Download the model from Hugging Face to start experimenting with model checkpoints.
Access inference and post-training scripts on GitHub to customize for your own data.
Explore the Cosmos documentation for in-depth tutorials, implementation details, and practical use cases.
Watch the COMPUTEX keynote from NVIDIA founder and CEO Jensen Huang, as well as NVIDIA GTC Taipei 2025 sessions.
Tune into our upcoming OpenUSD Insiders livestream, Wednesday, May 28, at 11 am PDT for a recap of the Cosmos reason release and other top physical AI announcements from GTC Taipei at Computex.
Stay up to date by subscribing to NVIDIA news, and following NVIDIA Omniverse on Discord, and YouTube.
- Visit our Omniverse developer page to get all the essentials you need to get started
- Access a collection of OpenUSD resources, including the new self-paced Learn OpenUSD training curriculum
- Connect with the Omniverse Developer Community
Get started with developer starter kits to quickly develop and enhance your own applications and services.