Autonomous vehicle (AV) research is undergoing a rapid shift. The field is being reshaped by the emergence of reasoning-based vision–language–action (VLA) models that bring human-like thinking to AV decision-making. These models can be viewed as implicit world models operating in a semantic space, allowing AVs to solve complex problems step-by-step and to generate reasoning traces that mirror human thought processes. This shift extends beyond the models themselves: traditional open-loop evaluation is no longer sufficient to rigorously assess such models, and new evaluation tools are required.
Recently, NVIDIA introduced Alpamayo, a family of models, simulation tools, and datasets to enable development of reasoning-based AV architectures. Our goal is to provide researchers and developers with a flexible, fast, and scalable platform for evaluating, and ultimately training, modern reasoning-based AV architectures in realistic closed-loop settings.
In this blog, we introduce Alpamayo and how to get up and running with reasoning-based AV development:
- Part 1: Introducing NVIDIA Alpamayo 1, an open, 10B reasoning VLA model, as well as how to use the model to both generate trajectory predictions and review the corresponding reasoning traces.
- Part 2: Introducing the Physical AI dataset, one of the largest and most geographically diverse open AV datasets available that enables training and evaluating these models.
- Part 3: Introducing NVIDIA AlpaSim, an open-source end-to-end simulation tool designed for evaluating end-to-end models
These three key components provide the essential pieces needed to start building reasoning-based VLA models: a base model, large-scale data for training, and a simulator for testing and evaluation.
Part 1: Alpamayo 1, an open reasoning VLA for AVs
Get started with the Alpamayo reasoning VLA model in just three steps.
Step 1: Access Alpamayo model weights and code
The Hugging Face repository contains pretrained model weights, which can be loaded with the corresponding code on GitHub.
Step 2: Prepare your environment
The Alpamayo GitHub repository contains steps to set up your development environment, including setting up uv (if not already installed) and creating a Python virtual environment.
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# Setup the virtual environment
uv venv ar1_venv
source ar1_venv/bin/activate
# Install pip in the virtual environment (if missing)
./ar1_venv/bin/python -m ensurepip
# Install Jupyter notebook package
./ar1_venv/bin/python -m pip install notebook
uv sync --active
Finally, as the model requires access to gated Hugging Face resources. Request access here:
Then, authenticate with:
hf auth login
and get your Hugging Face token here.
Step 3: Run the Alpamayo reasoning VLA
The model repository includes a notebook that will download the Alpamayo model weights, load some example data from the NVIDIA PhysicalAI-AV Dataset, run the model on it, and visualize the output trajectories and their associated reasoning traces.
In particular, the example data contains the ego-vehicle passing a construction zone, with four timesteps (columns) from four cameras (front_left, front_wide, front_right, front_tele, respectively in rows) visualized below.

After running this through the Alpamayo model, an example output you may see in the notebook is “Nudge to the left to increase clearance from the construction cones encroaching into the lane,” with the corresponding predicted trajectory and ground truth trajectory visualized below.

In case you would like to produce more trajectories and reasoning traces, please feel free to change the num_traj_samples=1 argument in the inference call to a higher number.
Part 2: Physical AI AV dataset for large-scale, diverse AV data
The PhysicalAI-Autonomous-Vehicles dataset provides one of the largest, most geographically diverse collections of multi-sensor data for AV researchers to build the next generation of physical AI based end-to-end driving systems.

It contains a total of 1,727 hours of driving recorded in 25 countries and over 2,500 cities (coverage shown below, with color indicating the number of clips per country). The dataset captures diverse traffic, weather conditions, obstacles, and pedestrians in the environment. Overall, it consists of 310,895 clips that are each 20 seconds long. The sensor data includes multi-camera and LiDAR coverage for all clips, and radar coverage for 163,850 clips.

To get started with the Physical AI AV Dataset, the physical_ai_av GitHub repository contains a Python developer kit and documentation (in the form of a wiki). In fact, this package was already used in Part 1 to load a sample of the dataset for Alpamayo 1.
Part 3: AlpaSim, a closed-loop simulation for AV evaluation
AlpaSim overview

AlpaSim is built on a microservice architecture centered around the Runtime (see Figure 6), which orchestrates all simulation activity. Individual services, such as the Driver, Renderer, TrafficSim, Controller, and Physics, run in separate processes and can be assigned to different GPUs. This design offers two major advantages:
- Clear, modular APIs via gRPC, making it easy to integrate new services without dependency conflicts.
- Arbitrary horizontal scaling, allowing researchers to allocate compute where it matters most. For example, if driver inference becomes the bottleneck, simply launch additional driver processes. If rendering is the bottleneck, dedicate more GPUs to rendering. And if a rendering process cannot handle multiple scenes simultaneously, you can run multiple renderer instances on the same GPU to maximize utilization.
But horizontal scaling alone isn’t the full story. The real power of AlpaSim lies in how the Runtime enables pipeline parallelism (see Figure 7).
In traditional sequential rollouts, components must wait on one another, for instance, the driver must pause after each inference step until the renderer produces the next perception input. AlpaSim removes this bottleneck: while one scene is rendering, the driver can run inference for another scene. This overlap dramatically improves GPU utilization and throughput. Scaling even further, driver inference can be batched across many scenes, while multiple rendering processes generate perception inputs in parallel.

A shared ecosystem
We provide initial implementations for all core services, including rendering via NVIDIA Omniverse NuRec 3DGUT algorithm, a reference controller, and driver baselines. We will also be adding additional driver models, including Alpamayo 1 and CAT-K in the coming weeks.
The platform also ships initially with roughly 900 reconstructed scenes, each 20 seconds long, and the Physical AI AV Dataset, giving researchers an immediate way to evaluate end-to-end models in realistic closed-loop scenarios. In addition, AlpaSim offers extensive configurability, from camera parameters and rendering frequency to artificial latencies and many other simulation settings.
Beyond these built-in components, we see AlpaSim evolving into a broader collaborative ecosystem. Eventually, labs can seamlessly plug in their own driving, rendering, or traffic models, and compare approaches directly on shared benchmarks.
AlpaSim in action
AlpaSim is already powering several of our internal research efforts.
Firstly, in our recently proposed Sim2Val framework, we demonstrated that AlpaSim rollouts are realistic enough to meaningfully improve real-world validation. By incorporating simulated trajectories into our evaluation pipeline, we were able to reduce variance in key real-world metrics by up to 83%, enabling faster and more confident model assessments.
Secondly, we rely on AlpaSim for closed-loop evaluation of our Alpamayo 1 model. By replaying reconstructed scenes and allowing the policy to drive end-to-end, we compute a DrivingScore that reflects performance under realistic traffic conditions.
Beyond evaluation, we are leveraging AlpaSim for closed-loop training using our concurrently released RoaD algorithm. RoaD effectively mitigates covariate shift between open-loop training and closed-loop deployment while being significantly more data-efficient than traditional reinforcement learning.Â

Getting started with Alpasim
Get started using AlpaSim for your own model evaluation in just three steps.
Step 1: Access AlpaSim
The open source repository contains the necessary software, with scene reconstruction artifacts available from the NVIDIA Physical AI Open Dataset.
Step 2: Prepare your environment
First, make sure to follow the onboarding steps in ONBOARDING.mdÂ
Then, perform initial setup/installations with the following command:
source setup_local_env.sh
This will compile protos, download an example driver model, download a sample scene from Hugging Face, and install the alpasim_wizard command line tool.
Step 3: Run the simulation
Use the wizard to build, run, and evaluate a simulation rollout:
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial
The simulation logs/output can be found in the created tutorial directory. For a visualization of the results, an mp4 file is created in tutorial/eval/videos/clipgt-05bb8212..._0.mp4 which will look similar to the following.
For more details about the output, and much more information about using AlpaSim, please see TUTORIAL.md.
Overall, this example demonstrates how real world drives can be replayed with an end-to-end policy, including all static and dynamic objects from the original scene. From this starting point and the flexible plug-and-play architecture of AlpaSim, users can tweak contender behavior, modify camera parameters, and iterate on policy.
Integrating your policy
Driving policies are easily swappable through generic APIs, allowing developers to test their state-of-the-art implementations.
Step 1: gRPC integration
AlpaSim uses gRPC as the interface between components: a sample implementation of the driver component can be used as inspiration for conforming to the driver interface.
Step 2: Reconfigure and run
AlpaSim is highly customizable through yaml file descriptions, including the specification of components used by the sim at runtime. Create a new configuration file for your model (some examples can be found below)
# driver_configs/my_model.yaml
# @package _global_
services:
driver:
image: <user docker image>
command:
- "<command to start user-defined service>"
And run:
alpasim_wizard +deploy=local wizard.log_dir=$PWD/my_model +driver_configs=my_model.yaml
Examples of customization using the CLI:
You can change the configuration when running the wizard example:
# Different scene
alpasim_wizard +deploy=local wizard.log_dir=$PWD/custom_run \
scenes.scene_ids=['clipgt-02eadd92-02f1-46d8-86fe-a9e338fed0b6']
# More rollouts
alpasim_wizard +deploy=local wizard.log_dir=$PWD/custom_run \
runtime.default_scenario_parameters.n_rollouts=8
# Different simulation length
alpasim_wizard +deploy=local wizard.log_dir=$PWD/custom_run \
runtime.default_scenario_parameters.n_sim_steps=200
Configuration is managed via Hydra – see src/wizard/configs/base_config.yaml for all available options.
To download the scene referenced above in Figure 9, you can run the following command:
hf download --repo-type=dataset \
--local-dir=data/nre-artifacts/all-usdzs \
nvidia/PhysicalAI-Autonomous-Vehicles-NuRec \
sample_set/25.07_release/Batch0001/02eadd92-02f1-46d8-86fe-a9e338fed0b6/02eadd92-02f1-46d8-86fe-a9e338fed0b6.usdz
Scaling your runs
AlpaSim adapts to fit your hardware configuration through coordination and parallelization of services, efficiently facilitating large test suites, perturbation studies, and training.
alpasim_wizard +deploy=local wizard.log_dir=$PWD/test_suite +experiment=my_test_suite.yaml runtime.default_scenario_parameters.n_rollouts=16
Conclusion: Putting it all together
The future of autonomous driving relies on powerful end-to-end models, and AlpaSim provides the capability to quickly test and iterate on those models, accelerating research efforts. In this blog we introduced Alpamayo 1 model, the physical AI dataset, and Alpasim Simulator. Together, it provides a complete framework for developing reasoning based AV systems–a model, large amounts of data to train it, and a simulator for evaluation.
Putting it all together, below is an example of Alpamayo 1 driving closed-loop through a construction zone within AlpaSim, demonstrating the model’s reasoning and driving capabilities as well as AlpaSim’s ability to evaluate AV models in a variety of realistic driving environments.
Happy coding!