In the race to understand our planet’s changing climate, speed and accuracy are everything. But today’s most widely used climate simulators often struggle: They can’t fully capture critical small-scale processes, like thunderstorms or towering tropical clouds, because of computational limits.

To capture these features, scientists run ultra-high-resolution simulations called cloud-resolving models (CRMs). These simulations track how clouds form and evolve—but they’re so expensive, running one for a decade of global climate forecasts is practically impossible.

What if we could distill the wisdom of these detailed simulations into a machine learning model that runs tens to hundreds of times faster, without giving up fidelity?

That’s the promise of ClimSim-Online, a reproducible framework for developing and deploying hybrid physics-machine learning climate models at scale. This framework was produced by NVIDIA Earth 2 and a consortium of international climate modelers from across government and academia. And it was initiated and supported by a Columbia University-based, National Science Foundation-funded science and technology center that is exploring the future of AI-powered climate simulation technology.

From terabytes to turnkey: training AI to emulate complex nested climate physics

ClimSim-Online builds on the award-winning ClimSim dataset, introduced at NeurIPS 2023. The dataset is served on the ClimSim Hugging Face repository. This dataset was created using the Energy Exascale Earth System Model-Multiscale Modeling Framework (E3SM-MMF)—a next-generation climate simulator that embeds thousands of localized, computationally-intensive CRMs within each atmospheric column of a host coarse-grid climate model. It’s an experimental way to generate climate predictions that reduces the number of assumptions that must typically be made about fine-scale physics—but it comes at such computational cost that it is not used in mainstream international projections. Outsourcing the nested physics to AI could change that.

The host climate model operates at a horizontal resolution of approximately 1.5 degrees (about 150km) or coarser, while each embedded CRM runs at 2km resolution, explicitly simulating clouds and convection at much finer scales.

Over a simulated 10-year span, E3SM-MMF produced a staggering 5.7 billion samples, each describing how small-scale physical processes alter the large-scale atmospheric state. These processes include how turbulent updrafts lead to cloud formation, what causes microphysical droplets to form, how convection organizes from scales of individual clouds to large organized cloud complexes, and how these cloud systems interact with solar and infrared radiation, thereby regulating climate.

This massive dataset serves as the foundation for training ML models that emulate subgrid physics and can replace the expensive embedded CRM that consumes approximately 95% of the total computational expenses. It has already spurred a global Kaggle competition that attracted over 460 teams from around the world to develop and benchmark ML solutions on this high-fidelity climate dataset—helping accelerate progress through open, collaborative innovation.

A schematic diagram of the ClimSim dataset. The diagram shows the input variables on the left, which consist of a set of macro-scale state variables. On the right, the diagram displays the target variables, which primarily include the tendencies of those state variables due to unresolved processes. — Figure 1. A schematic of the ClimSim dataset and the underlying machine-learning problem. Inputs consist of a set of macro-scale state variables; targets primarily include the tendencies of those state variables due to unresolved processes.

The challenge? These models need to be more than just accurate offline. They must remain stable when integrated into a live climate simulator—running hour after hour, year after year—without letting the virtual atmosphere drift into unrealistic states. Controlling the behavior of hybrid physics-ML simulations is a marquee challenge, especially in situations where the host physics model cannot be made differentiable. Some simple host models can be rewritten in differentiable code, enabling ML optimization of hybrid dynamics directly. But many candidate host models are not easy to rewrite differentiably, or are so nonlinear that direct optimization on hybrid behavior is impractical. Fully featured climate simulators spanning millions of lines of source code are a prime example.

Plug in and simulate

ClimSim-Online was pioneered by NVIDIA to make hybrid climate modeling accessible to the broader ML community. We built a reproducible, containerized workflow to sidestep the typical blockers of running fully featured climate simulators—such as dependencies on certain supercomputer and software environments—that limit the community who can interact with them. With just a TorchScript model file, users can inject their trained ML model into the Fortran-based E3SM climate simulator and launch hybrid simulations. That’s whether they’re on local workstations, HPC clusters, or cloud VMs—and they can plug into standardized diagnostics to measure their success.

A flowchart diagram of the E3SM-MMF simulation workflow. The diagram shows various modules represented as blocks, each executed during one model time step. In standard simulations, the 'physics-before-coupling' block includes the embedded Cloud Resolving Model (CRM). In hybrid mode, this block is replaced by a Machine Learning (ML) emulator. The entire workflow is containerized for easy deployment across different platforms. — Figure 2. A flowchart of the E3SM-MMF simulation workflow. Each block represents a module executed during one model time step. In standard simulations, the “physics-before-coupling” module includes the embedded CRM. In hybrid mode, this module is replaced by the ML emulator. The full workflow is containerized for easy deployment across platforms.

It’s climate emulation, now plug-and-play.

And the entire system runs in a container preloaded with all required libraries and dependencies. Just load, mount, and simulate. Users can find instructions for setting up the container in the ClimSim-Online repository. The entire workflow—from accessing data, training the machine learning model, to running and evaluating hybrid climate simulations—can be found at the ClimSim repository.

A breakthrough: stable for years, realistic to the tropopause

Scientists in the NVIDIA Research and Development of Technology organizations have now made an important breakthrough, using these new APIs. In our latest paper, published July 10 in the Journal of Advances in Modeling Earth Systems (JAMES), we demonstrate multi-year stable hybrid simulations using a U-Net neural network trained on the ClimSim dataset using PhysicsNemo. It established a new benchmark for online skill within ClimSim-Online. PhysicsNemo is an open-source, deep-learning framework that enables users to explore, develop, validate, and deploy state-of-the-art methods for science and engineering that can combine physics-based knowledge with data.

But the real breakthrough? Physics-informed machine learning.

To avoid runaway simulations and unrealistic cloud behavior, we built microphysical constraints directly into the neural network architecture:

All condensates follow temperature-based phase partitioning, just like the cloud-resolving model that the neural network is emulating.
No lingering ice clouds above the tropopause.

With these hard constraints, we stabilized previously drifting simulations and drastically improved the realism of cloud climatologies—especially in the tropics, where unconstrained models tended to overestimate clouds at high altitudes.

The research process that led to this solution was fundamentally accelerated by ClimSim-Online: Being able to rapidly iterate on evolving downstream hybrid model pathologies was key to unearthing the clues that ultimately informed our scientific detective work.

Two side-by-side line graphs show how different machine learning models perform over 12-month hybrid climate simulations. The left graph shows temperature error (in Kelvin), and the right graph shows moisture error (in grams per kilogram). Each line represents a different model. In both graphs, red lines represent U-Net models with physical constraints. These models have lower and more stable errors throughout the year. Cyan lines represent unconstrained U-Net models with higher and growing error. Blue lines show unconstrained MLP models, which become unstable early. Dashed black lines represent the internal unpredictability limit of the atmosphere—these are the lowest achievable errors, and the constrained models get close to them. The X-axis in both graphs shows months from one to 12. — *Figure 3. The online monthly, globally averaged* root mean square error of temperature and moisture over a one-year simulation. Red lines represent hybrid models with microphysics constraints, approaching the theoretical lower bound of internal atmosphere unpredictability. Blue and cyan lines show unstable error growth for unconstrained MLP and U-Net baselines.

In our hybrid simulations, we observe that the temperature bias stayed under 2 degrees Celsius, and humidity bias is held under 1 gram per kilogram within the troposphere—a new state-of-the-art result under the multiscale modeling framework.

And we saw stable >five-year simulations with explicit cloud condensate modeling, real geography, and land-atmosphere coupling, a milestone not previously demonstrated in this class of hybrid simulations.

Ready for takeoff

ClimSim-Online lowers the barrier for AI-climate collaboration. It makes it easy to:

Train ML models using world-class simulation data
Benchmark offline skill
And most importantly, evaluate online performance inside a full-scale climate simulator—the ultimate test of real-world readiness.

Whether you’re an AI researcher eager to work on climate or a climate scientist curious about the power of hybrid modeling, ClimSim-Online brings you tools to join the next wave of climate simulation.

While we have demonstrated a domain science-informed approach to solving first-order issues of hybrid modeling, much more work remains to bring hybrid biases down to truly tolerable levels. And new ideas are needed. For instance: Could the reinforcement learning community find an even more robust solution agnostic to domain science? Now that ClimSim-Online makes it easy to sample the downstream, non-differentiable reward signal, perhaps we will soon find out. The future of hybrid physics-ML climate simulation awaits.