Robotics

R²D²: Scaling Multimodal Robot Learning with NVIDIA Isaac Lab

Building robust, intelligent robots requires testing them in complex environments. However, gathering data in the physical world is expensive, slow, and often dangerous. It is nearly impossible to safely train for real-world critical risks, such as high-speed collisions or hardware failures. Worse, real-world data is usually biased toward “normal” conditions, leaving robots unprepared for the unexpected.​

Simulation is essential to bridge this gap, providing a risk-free environment for rigorous development. However, traditional pipelines struggle to support the complex needs of modern robotics. Today’s generalist robots must master multimodal learning—fusing diverse inputs such as vision, touch, and proprioception to navigate messy, unstructured worlds. This creates a new requirement for simulation: it must deliver scale, realism, and multimodal sensing all in one tight training loop, something traditional CPU-bound simulators cannot handle efficiently.

This edition of NVIDIA Robotics Research and Development Digest (R²D²) explains how NVIDIA Isaac Lab, an open source GPU-native simulation framework from NVIDIA Research, unifies these capabilities in a single stack designed for large-scale, multimodal robot learning

Key robot learning challenges

Modern robot learning in simulation pushes simulation infrastructure to its limits. To train robust policies efficiently, researchers must overcome critical hurdles, including:

  • Scaling simulation to thousands of parallel environments to overcome the slow training times of CPU-bound tools
  • Integrating multiple sensor modalities (vision, force, and proprioception) into synchronized, high-fidelity data streams
  • Modeling realistic actuators and control frequencies to capture the nuances of physical hardware
  • Bridging the gap between simulation and real-world deployment through robust domain randomization and accurate physics

Isaac Lab: Open source, unified framework for robot learning

Isaac Lab is a GPU-accelerated simulation framework for multimodal robot learning. It is a unified, GPU-native platform designed to solve the challenges of modern robot learning. By consolidating physics, rendering, sensing, and learning into a single stack, it provides  researchers with the technology to train generalist agents with unprecedented scale and fidelity.​

A robot focused on handling a cardboard box within a warehouse setting, showcasing its precision and functionality. Below the main view, there are color-coded overlays displaying different perspectives or depth information related to the box's geometry and the robot's hand positioning.
Figure 1. Isaac Lab simulation framework supports diverse robotic applications

Isaac Lab core elements

The key elements of Isaac Lab include:

  • GPU-native architecture: Delivers end-to-end GPU acceleration for physics and rendering, enabling massive parallelism to drastically reduce training time.​
  • Modular and composable design: Features flexible components for diverse embodiments (humanoids, manipulators) and reusable environments to accelerate development.​
  • Multimodal simulation: Leverages tiled RTX rendering and Warp-based sensors to generate rich, synchronized observations (vision, depth, tactile) alongside realistic multi-frequency control loops.
  • Integrated workflows: Provides built-in support for reinforcement learning (RL) and imitation learning (IL), streamlining large-scale data collection, domain randomization, and policy evaluation. It connects out-of-the-box with top RL libraries including SKRL, RSL-RL, RL-Games, SB3, and Ray, and seamlessly integrates with NVIDIA Cosmos-generated data for augmented imitation learning.

Inside the Isaac Lab framework: A modular toolkit

Isaac Lab breaks down robot learning into composable building blocks, enabling you to build complex, scalable tasks without “reinventing the wheel.”

Figure showing diverse assets (rigid/soft bodies, articulated robots), multimodal sensors (RGB-D, proprioception), and standard controllers (IK, RMPFlow).
Figure 2. Isaac Lab includes diverse assets, multimodal sensors, and standard controllers

Features include a manager-based workflow, procedural scene generation, and more.

Manager-based workflow

Instead of writing monolithic scripts that mix physics and logic, Isaac Lab decouples your environment into separate “Managers” for observations, actions, rewards, and events. This makes your code modular and reusable. For example, you can swap a robot’s reward function without touching its sensor setup.

@configclass
class MyRewardsCfg:
    # Define rewards as weighted terms
    track_lin_vel = RewTerm(func=mdp.track_lin_vel_xy_exp, weight=1.0, params={"std": 0.5})
    penalty_lin_vel_z = RewTerm(func=mdp.lin_vel_z_l2, weight=-2.0)
    
@configclass
class MyEnvCfg(ManagerBasedRLEnvCfg):
    # Plug in the reward config cleanly
    rewards: MyRewardsCfg = MyRewardsCfg()
    # ... other managers for actions, observations, etc.

Procedural scene generation

To prevent overfitting, you rarely want to train on a single static scene. With the Isaac Lab scene generation tools, you can define rules to spawn diverse environments procedurally. Whether it’s scattering debris for a navigation task or generating rough terrain for locomotion, you define the logic once, and the framework builds thousands of variations on the GPU.

# Configure a terrain generator with diverse sub-terrains
terrain_cfg = TerrainGeneratorCfg(
    sub_terrains={
        "pyramid_stairs": MeshPyramidStairsTerrainCfg(
            proportion=0.2, step_height_range=(0.05, 0.2)
        ),
        "rough_ground": MeshRandomGridTerrainCfg(
            proportion=0.8, noise_scale=0.1
        ),
    }
)

More features

In addition, Isaac Lab provides: 

  • A unified asset API for importing any robot from USD, URDF, or MJCF 
  • Realistic Actuators to model motor dynamics, alongside 10+ Sensor types ranging from IMUs to photorealistic RTX cameras
  • A built-in teleoperation stack to further simplify data collection

Together, these features provide what you need to efficiently move from prototype to deployed policy.

Delivering GPU-accelerated performance at scale

Isaac Lab delivers the massive throughput required for modern robot learning, achieving 135,000 FPS for humanoid locomotion (Unitree H1) and over 150,000 FPS for manipulation (Franka Cabinet)—training policies in minutes rather than days. Its unified GPU architecture eliminates CPU bottlenecks, maintaining high throughput even with complex RGB-D sensors enabled across 4,096 environments. 

Benchmarks confirm linear scaling with VRAM and successful zero-shot transfer for diverse embodiments, including dexterous hands, multi-agent swarms, and the H1 humanoid walking robustly outdoors.

A canonical robot learning workflow

Isaac Lab standardizes the robot learning loop into a clear, Python-first workflow. Whether you’re training a locomotion policy or a manipulation skill, the process follows the same four steps: design, randomize, train, and validate.

To run a complete example—training a humanoid to walk—right out of the box, follow the steps below.

Step 1: Design and configure

First, define your environment in Python. Select your robot (Unitree H1, for example), sensors, and randomization logic using a configuration class:

# pseudo-code representation of a config
@configclass
class H1FlatEnvCfg(ManagerBasedRLEnvCfg):
    scene = InteractiveSceneCfg(num_envs=4096, env_spacing=2.5)
    robot = ArticulationCfg(prim_path="{ENV_REGEX_NS}/Robot", spawn=...)
    # Randomization and rewards are defined here

For more details, see the H1 Humanoid Environment Configuration in the isaac-sim/IsaacLab GitHub repo. Optionally, you can include additional sensors. Configuring your sensors is easy.

Configure a tiled camera:

from isaaclab.sensors import TiledCameraCfg

# Define a camera attached to the robot's head
tiled_camera: TiledCameraCfg = TiledCameraCfg(
    prim_path="{ENV_REGEX_NS}/Robot/head/camera",
    offset=TiledCameraCfg.OffsetCfg(
 	pos=(-7.0, 0.0, 3.0), 
  	rot=(0.9945, 0.0, 0.1045, 0.0), 
 	convention="world"),
    data_types=["rgb"],
    spawn=sim_utils.PinholeCameraCfg(
      		focal_length=24.0, 
focus_distance=400.0, 
horizontal_aperture=20.955, 
clipping_range=(0.1, 20.0)
    ),
    width=80,
    height=80,
)

Configure a ray-caster (LiDAR):

from isaaclab.sensors import RayCasterCfg, patterns

# Define a 2D LiDAR scanner
lidar = RayCasterCfg(
    prim_path="{ENV_REGEX_NS}/Robot/base_link/lidar",
    update_period=0.1,       # Run at 10Hz
    offset=RayCasterCfg.OffsetCfg(pos=(0.0, 0.0, 0.2)),
    attach_yaw_only=True,    # Stabilize against robot tilt
    pattern_cfg=patterns.LidarPatternCfg(
        channels=32, 
        vertical_fov_range=(-15.0, 15.0), 
        horizontal_fov_range=(-180.0, 180.0)
    )
)

Step 2: Train the policy

Next, launch a training script to start learning. Isaac Lab uses the gymnasium interface, so it connects easily to RL libraries like RSL-RL or SKRL.

# Train a policy for the Unitree H1 humanoid
# This runs 4096 environments in parallel on your GPU
python source/standalone/workflows/rsl_rl/train.py --task=Isaac-Velocity-Flat-H1-v0

Step 3: Play and visualize

Once training is complete, verify the policy by running it in inference mode. This loads the trained checkpoint and renders the result.

# Run the trained policy and visualize the robot walking
python source/standalone/workflows/rsl_rl/play.py --task=Isaac-Velocity-Flat-H1-v0

Step 4: Sim-to-real deployment

After validation, the policy can be exported to ONNX or TorchScript for deployment on physical hardware, leveraging the domain randomization applied during training. To see real-world examples, see the Sim-to-Real Deployment Guide.

Ecosystem adoption

Leading organizations and research labs in humanoid robotics, embodied AI, and legged locomotion are deploying Isaac Lab to accelerate the development of generalist robot policies and foundation models, including:

  • Agility Robotics’ general-purpose humanoid, Digit, uses the Isaac Lab framework to refine whole-body control through millions of reinforcement learning scenarios, which accelerate enhancements to its skill sets such as step recovery from environmental disturbances, often needed in highly dynamic areas like manufacturing and logistics facilities.
  • Skild AI is building a general-purpose robotics foundation model that spans legged, wheeled and humanoid robots, using Isaac Lab for locomotion and dexterous manipulation tasks training and NVIDIA Cosmos world foundation models for generating training datasets. 
  • FieldAI is training cross-embodied robot brains for monitoring and inspection in construction, manufacturing, and oil and gas environments, using Isaac Lab for reinforcement learning and NVIDIA Isaac Sim for synthetic data generation and software-in-the-loop validation.
  • The Robotics and AI Institute uses NVIDIA Isaac Lab to train high-performance reinforcement learning controllers for agile legged locomotion, dynamic whole-body manipulation, and custom robotics platforms, optimizing simulator parameters to close the sim-to-real gap before deploying policies on Boston Dynamics Spot and Atlas, and RAI’s Ultra Mobile Vehicle (UMV).
  • UCR is building rugged humanoid robots for heavy industries on the NVIDIA Isaac platform, using Isaac GR00T’s synthetic data pipelines, Isaac Lab, and Isaac Sim to train end‑to‑end mobility policies and iteratively close sim-to-real gaps for robust deployment of Moby in harsh construction and industrial sites.

Get started with multimodal robot learning

Ready to scale your own multimodal robot learning workloads with Isaac Lab? Start here with core resources and level up with the latest research for advanced workflows.

Learn more about how researchers are leveraging simulation and generative AI to push the boundaries of robot learning:

  • Harmon: Combines language models and physics to generate expressive whole-body humanoid motions directly from text.
  • MaskedMimic: A generalist control policy that learns diverse skills through motion inpainting, simplifying humanoid control without complex rewards.
  • SIMPLER: A framework for evaluating real-world manipulation policies (RT-1, Octo) in simulation to reliably predict physical performance.

NVIDIA GTC AI Conference is happening March 16–19, 2026 in San Jose with a must-see keynote with CEO Jensen Huang at SAP Center on March 16 at 11:00 a.m., Pacific time. Discover GTC robotics sessions on how AI, simulation, and accelerated computing are enabling robots to see, learn, and make decisions in real time.

This post is part of our NVIDIA Robotics Research and Development Digest (R2D2) series that helps developers gain deeper insight into the SOTA breakthroughs from NVIDIA Research across physical AI and robotics applications.

Stay up-to-date by subscribing to the newsletter and following NVIDIA Robotics on YouTube, Discord, and developer forums

To get started on your robotics journey, enroll in free NVIDIA Robotics Fundamentals courses.

Discuss (0)

Tags