NVIDIA Cosmos for Developers
NVIDIA Cosmos™ is a platform of state-of-the-art generative world foundation models, advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline for autonomous vehicles (AVs) and robotics developers.
Build, evaluate, deploy, and simulate physical AI models faster while minimizing testing and validation risks in the real world.
See Cosmos World Foundation Models in Action
Cosmos world foundation models (WFMs) generate high-fidelity, physics-aware video from simple inputs, simulating and predicting real-world outcomes for robotics and autonomous systems.
NVIDIA Cosmos World Foundation Models
The first wave of our first versions of pre-trained models for generating physics-aware videos and world states are now available openly to developers.
NVIDIA Cosmos has inbuilt guardrails to filter brands, unsafe content, and harmful prompts within Cosmos generated outputs. Cosmos also has guardrails to blur human faces, post-guards to remove questionable scenarios, and digital watermarks on synthetic videos generated from NVIDIA NIM™ microservices.
Autoregressive
Predict future frames in a video sequence, leveraging temporal dependencies to generate coherent and realistic motion.
Cosmos Super:
- Cosmos-1.0-Autoregressive-4B
- Cosmos-1.0-Autoregressive-5B-Video2World
- Cosmos-1.0-Autoregressive-12B
- Cosmos-1.0-Autoregressive-13B-Video2World
Diffusion
Create videos by progressively refining random noise into coherent video frames through iterative denoising guided by learned temporal and spatial patterns.
Cosmos Super:
- Cosmos-1.0-Diffusion-7B-Text2World
- Cosmos-1.0-Diffusion-7B-Video2World
- Cosmos-1.0-Diffusion-14B-Text2World
- Cosmos-1.0-Diffusion-14B-Video2World
Workflow Enablers
Essential models that simplify the development and deployment of world models in physical AI applications.
Cosmos Super:
- Cosmos-1.0-Guardrail
State-of-the-art model combining pre- and post-generation guards to ensure safety and consistency.
- Cosmos-1.0-PromptUpsampler-12B-Text2World
Enhances prompt quality by improving text prompt descriptions and details automatically.
- Cosmos- 1.0-Diffusion-7B- Decoder
Decodes autoregressive video sequences for augmented reality.
Fine-Tuned Samples
- Cosmos-1.0-Diffusion-7B-Text2World-Sample-MultiviewDriving
Fine-tuned for AV multi-sensor driving views. Coming soon.
Introducing Cosmos for Physical AI Development
Get an introduction to the models, tools, and capabilities of the Cosmos platform to accelerate the development of physical-AI-embodied systems such as robots and autonomous vehicles.
Building Custom World Models With NVIDIA NeMo
New NVIDIA NeMo capabilities for customizing video foundation models, from data curation and model tuning, to inference pipeline.
Coming Soon
Open Cosmos World Foundation Models
Open Cosmos world foundation models and tokenizers are enabling developers to build physical AI without high entry costs.
Starter Kits
Start developing world models with Cosmos by accessing open models, fine-tuning tutorials, and more how-to on downstream applications and various stages of physical AI development.
Starter Kits by Use Case
Synthetic Data Generation
Build and deploy world models for infinite domain-specific synthetic data.
Policy Model Development
Fine-tune Cosmos WFMs to build policy models for mapping a physical AI system’s states to optimal actions based on learned behavior or rules.
Policy Model Validation
Fine-tuned Cosmos WFMs on validation data can accelerate initialization, evaluation, validation, and benchmarking of policy models before real-world deployment.
Starter Kits by Model Development Stage
Process and Curate Video Data
NeMo Curator generates high-quality training data with scalable pipelines that efficiently handle 100+ PB of data. With out-of-the-box-optimized performance that delivers a 35X speedup, NeMo Curator minimizes processing costs and accelerates time-to-market.
Tokenize Training Data
Cosmos tokenizers for images and videos offer up to 8X better compression and 12X faster speeds than open tokenizers, reducing computational costs.
Train and Customize
NVIDIA NeMo accelerates the development of world models by efficiently training and fine-tuning, multimodal models at scale with popular customization techniques like LoRA and SFT.
- Customize Diffusion and Autoregressive Models
NVIDIA Cosmos Learning Library
More Resources
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the System Card, Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.
