How to Run AI-Powered CAE Simulations

In modern engineering, the pace of innovation is closely linked to the ability to perform accelerated simulations. Computer-aided engineering (CAE) plays a vital role in the design of optimal and reliable engineering products by helping verify performance and safety. Traditional numerical simulations produce accurate results but often require hours, days, or even weeks to run. These lengthy simulations make it challenging to explore many design options and maintain an efficient feedback loop between design and analysis.

To reduce simulation time, engineers are increasingly using physics-based AI models as surrogates. Trained on data from traditional simulations, these models can predict outcomes significantly faster, often in seconds or minutes. This rapid generation of approximate solutions enables engineers to quickly explore a wider range of design alternatives.

This approach doesn’t replace traditional solvers; instead, it complements them. Surrogate models are excellent for initial exploration, helping to pinpoint promising designs that can then be validated using more precise, trusted solvers.

This post presents an end-to-end reference workflow for an automotive aerodynamics use case that enables software developers and engineers to leverage the benefits of AI-powered simulations. It is modular, adaptable beyond external aerodynamics, and built on NVIDIA technologies. The following sections guide you through each stage and tool in the workflow:

Data preprocessing with NVIDIA PhysicsNeMo Curator: This library is a submodule of PhysicsNeMo that helps organize and process engineering and scientific datasets, making it easier and faster to set up AI model training workflows.
AI physics model training with NVIDIA PhysicsNeMo: PhysicsNeMo is a framework designed for building and training physics AI models, leveraging state-of-the-art architectures.
Deployment and inference with NVIDIA NIM microservices: A streamlined and scalable solution for deploying pretrained models, making AI-powered predictions accessible through simple, standard APIs.
Real-time, interactive visualization with NVIDIA Omniverse and Kit-CAE: A new development platform for building digital twins, Kit-CAE enables engineers to explore large CAE simulation datasets in realistic 3D environments. Leveraging Kit data delegates and APIs, it combines powerful data handling with collaborative, photorealistic visualization.

To showcase the full end-to-end workflow, we’re using a lightweight yet insightful dataset featuring multiple Ahmed body designs simulated across a range of inlet Reynolds numbers. This dataset includes 3D geometry, pressure distributions, and wall shear stress fields, making it ideal for aerodynamic simulations. Its compact size enables quick experiments without heavy storage or compute needs, making it ideal for teaching, prototyping, and rapid iteration. For more details, download the dataset. And if you’re ready to put your own proprietary CAE data into action, get started with the PhysicsNeMo platform.

Reference workflow for CAE. — *Figure 1. The reference workflow for CAE includes data preprocessing, training, inference, and visualization*

Data preprocessing with NVIDIA PhysicsNeMo Curator

PhysicsNeMo is an open source deep learning framework for building, training, and fine-tuning AI models for physics-based simulations. It supports various models, including neural operators and graph neural networks (GNNs). DoMINO (Decomposable Multi-scale Iterative Neural Operator) and X-MeshGraphNet are advanced architectures within PhysicsNeMo: DoMINO, a multiscale neural operator, predicts flow fields in large-scale simulations like automotive aerodynamics. X-MeshGraphNet, an enhanced GNN, overcomes scalability and mesh-dependency issues by constructing graphs directly from geometry files, making it efficient for physics simulations.

PhysicsNeMo Curator is the main engine for DoMINO data preprocessing, accelerating ETL (extraction, transformation, and loading) of scientific and engineering datasets using GPU. Its specialized DoMINO ETL pipeline converts raw VTK (.vtu, .vtp) and STL files into ML-ready formats like Zarr or NumPy. It handles geometry, volume, and surface meshes, extracting and non-dimensionalizing fields like pressure and wall shear stress, computing derived quantities, and optionally decimating meshes. Outputs are compressed, chunked, and metadata-preserved for reproducibility.

For X-MeshGraphNet, preprocessing uses preprocessing.py to convert 3D mesh data into partitioned graphs, with nodes storing surface variables and edges capturing relationships. VTK, PyVista, and DGL are used, and outputs are saved as *.bin files for training. This workflow will eventually integrate into PhysicsNeMo Curator.

AI physics model training with NVIDIA PhysicsNeMo

PhysicsNeMo enables training of AI surrogate models for complex engineering simulations, providing scalable architectures like DoMINO and X-MeshGraphNet to accelerate CFD workflows.

DoMINO model architecture

Traditional machine learning (ML) models often struggle with accuracy, scalability, and generalization for complex engineering simulations. The DoMINO architecture addresses this with a multiscale, iterative neural operator to model large-scale physics problems. It takes a geometry (STL file) as input and predicts surface pressure, wall shear stress, and volumetric velocity fields.

The DoMINO architecture works in three main stages:

Global geometry representation: Model learns a multiscale encoding from point clouds, capturing short- and long-range dependencies, enriched with signed distance fields (SDF) and positional encoding.
Local geometry representation: Model samples discrete points to evaluate the solution, constructs local subregions, and extracts local encodings using dynamic point convolution kernels.
Aggregation network: Creates a computational stencil around each point by sampling neighbors in the subregion and aggregates local encodings to predict the solution at each point.

DoMINO serves as a fast, accurate, and scalable surrogate for large-scale simulations.

Graphic showing DoMINO architecture. — Figure 2. DoMINO leverages point-cloud input and multiscale geometry encoding to iteratively predict flow fields at discrete points, offering a scalable and accurate surrogate model for complex physics tasks like automotive aerodynamics

How to train DoMINO

DoMINO training and testing use YAML configuration files powered by Hydra in three steps.

Step 1: Configure your settings
- Start by navigating to the src/conf directory and modifying the config.yaml file. This file allows you to specify key parameters for data, training, and testing.
- In the data section, define paths to your training and validation data, as well as bounding box sizes for your simulation domain.
- In the train section, you can set training parameters such as the number of epochs and batch size.
- If you are using cached data, you should use conf/cached.yaml instead of conf/config.yaml.
Step 2: Train the model
- To begin training, simply run the train.py script.
- The DoMINO model supports both single- and multi-GPU training. For distributed training, you can use torchrun or MPI.
- If the training process crashes due to out-of-memory (OOO) errors, you can adjust the number of points sampled for the volume and surface fields in the configuration file to manage memory requirements for your GPU.
- The model also supports automatic checkpointing, allowing it to resume from the latest checkpoint if interrupted.
Step 3: Test the model and visualize results
- After training, run the test.py script to test the model on raw simulation files (.vtp or .vtu, for example).
- Ensure that you modify the eval key in the configuration file to specify the checkpoint, input, and output directories. It is important to note that the test data should be in its raw simulation format and not processed into .npy files.
- The predictions will be written to the same test files. You can then download these validation results and visualize them in a tool like Paraview to assess the model’s performance.

X-MeshGraphNet model architecture

GNNs are a fast alternative to traditional CFD solvers but often face poor scalability on high-resolution meshes, reliance on costly pregenerated meshes, and difficulty with multiscale simulations.

Diagram showing an overview of the X-MeshGraphNet architecture. — *Figure 3. X-MeshGraphNet is a multiscale GNN with partitioned message passing for scalable physics simulations*

X-MeshGraphNet addresses scalability with three innovations:

Partitioned graphs with halo regions: Breaks large graphs into smaller overlapping subgraphs to reduce memory and computation while maintaining accuracy
Mesh-free graph construction: Builds graphs directly from 3D geometry
Multiscale graph refinement: Captures interactions at different scales. Halo regions and gradient aggregation ensure training on partitions matches the full graph.

How to train X-MeshGraphNet

Step 1: Configure the model – Navigate to the surface folder and specify your configurations in the conf/config.yaml file, ensuring the dataset path is correct.
Step 2: Prepare the data – The STL files in the dataset must be combined into a single solid. Run combine_stl_solids.py to create a single solid that can be used to generate a surface point cloud.
Step 3: Preprocess the graphs – Run preprocessing.py to prepare the partitioned graphs and save them.
Step 4: Set up validation – Create a partitions_validation folder and move the samples you want to use for validation into it.
Step 5: Compute statistics – Run compute_stats.py to calculate the global mean and standard deviation from your training samples.
Step 6: Start training – Finally, run train.py to begin the training process.

How to deploy and perform inference with NVIDIA NIM

Once a surrogate model is trained, the next step is deployment into engineering workflows. NVIDIA NIM microservices simplify this by exposing pretrained AI models through standard APIs for seamless integration into pipelines. NIM microservices can run locally, in the cloud, or at the edge. This reference workflow uses the DoMINO-Automotive-Aero NIM, which accepts 3D geometries in STL format and outputs predicted surface and volume fields.

While the pretrained DoMINO model performs well, most applications benefit from adaptation. Fine-tuning requires far less data than training from scratch and improves accuracy with minimal effort. By leveraging physics knowledge from pretraining, fine-tuning adapts models to new conditions and geometries, making them more accurate, efficient, and robust. As a result, the DoMINO-Automotive-Aero NIM can generalize beyond automotive to other physics-based applications.

Running DoMINO-Automotive-Aero NIM

The DoMINO-Automotive-Aero NIM is a powerful tool for vehicle aerodynamic simulations. To achieve the best results, it’s important to understand the inputs and outputs of the model.

Inputs

The model requires specific inputs to run a simulation and generate predictions:

Vehicle STL: You must provide the vehicle’s geometry as a watertight, correctly oriented STL file.
Velocity inlet: Specify the inflow velocity at the vehicle’s front. For optimal accuracy, this should be between 20 and 60 m/s.
Stencil size: This parameter controls the model’s accuracy. Larger stencil sizes lead to more accurate results but also increase computational cost and runtime.
Volume sampling points: The number of points you choose to sample in the volume around the vehicle determines the fidelity of the predicted flow fields. A higher number of points provides more detailed results but increases computational cost.

The following code snippet demonstrates how to send an STL file and other parameters to the inference API:

import httpx
import io
import numpy as np

# Define the URL for the inference API
url = "http://localhost:8000/v1/infer"

# Define the parameters for the inference request
data = {
    "stream_velocity": "30.0",
    "stencil_size": "1",
    "point_cloud_size": "500000",
}

# Open the STL file and send it to the NIM
with open("your_vehicle.stl", "rb") as stl_file:
    files = {"design_stl": ("your_vehicle.stl", stl_file)}
    r = httpx.post(url, files=files, data=data, timeout=120.0)

Outputs

The NIM provides a variety of outputs to help you analyze the simulation:

Surface flow parameters: The model predicts pressure distributions and near wall quantities such as wall shear stress directly on the vehicle’s surface.
Volume flow fields: You will get predictions for fluid dynamic parameters like velocity, pressure, and turbulent flow parameters at the sampled points in the volume around the vehicle.
Additional properties: The output also includes the Signed Distance Field (SDF), which helps with visualizing the geometry relative to the flow, and bounding box details that define the computational domain.

The response from the API is a compressed NumPy array. The following code snippet shows how to load and inspect the output:

# Load the response content into a NumPy array
with np.load(io.BytesIO(r.content)) as output_data:
    output_dict = {key: output_data[key] for key in output_data.keys()}
    print(output_dict.keys())

Real-time interactive visualization with NVIDIA Omniverse APIs

NVIDIA Omniverse is a platform for building, connecting, and operating 3D applications using OpenUSD and RTX technologies. It enables real-time collaboration, high-fidelity virtual worlds, and is applied in product design, robotics, autonomous vehicles, and facility simulation.

In CAE, AI surrogate model predictions can be visualized with Kit-CAE, an Omniverse sample for data processing and rendering. Kit-CAE supports scientific datasets in formats like HDF5, CGNS, VTK, and NumPy, enabling direct use in Omniverse. Its extensions allow visualization through streamlines, volumes, and glyphs, while developers can customize it to add new workflows.

To visualize predictions from the DoMINO-Automotive-Aero NIM and car geometry from the corresponding STL file in Omniverse, follow these steps:

Step 1: Run inference – Call the NIM with a given STL file as described in the previous section, and save the inference results as a NumPy archive (.npz).
Step 2: Install Kit-CAE – Clone the GitHub repository and follow the instructions in its README to build and launch the Kit-CAE sample application.
Step 3: Import files – Use the import menu to convert the STL and NPZ files to OpenUSD and import them into the 3D scene. The NumPy file should be imported as a point cloud dataset.
Step 4: Visualize results – Right-click on the imported NumPy dataset primitive and use the context menu to run one of the pre-defined CAE algorithms for data processing and visualization. You may need to specify the field array of interest based on the chosen method.

Screenshot showing a volume visualization of a predicted turbulent kinetic energy field using Kit-CAE. — *Figure 4. Volume visualization of a predicted turbulent kinetic energy field using Kit-CAE*

Bringing simulation results into Omniverse enables advanced 3D applications for CAE. Using Omniverse SDKs, APIs, and microservices, real-time digital twins can combine high-fidelity assets with simulation data from Kit-CAE. The low latency of AI surrogate models makes interactive exploration of large design spaces feasible, providing immediate feedback in 3D.

To create a more comprehensive and interactive experience, users can adopt the Omniverse Blueprint for real-time computer-aided engineering digital twins, which provides a more advanced reference workflow. Learn more about the virtual wind tunnel experience and find its source code on GitHub.

Video 1. Learn how the NVIDIA Omniverse Blueprint enables real-time digital twins for CAE

Get started with AI-powered simulations

The AI-powered simulation workflow, demonstrated with automotive aerodynamics in this post, is a universal framework for any simulation-driven industry. It uses high-fidelity simulation data to train fast AI models that enable real-time, interactive analysis, addressing long runtimes, costly design iterations, and slow feedback loops. To get started, check out the NVIDIA DLI course, Accelerating CAE Simulations Using PhysicsNeMo.

Applications include:

Aerospace: Accelerating airfoil and aircraft optimization; analyzing aerothermal effects during re-entry. NVIDIA partner Luminary Cloud used this framework with the DoMINO architecture to train Shift Wing, a physics-based AI model for transonic wing design.
Energy: Optimizing turbomachinery flow, heat exchangers, and wind farm layouts.
Manufacturing: Speeding injection mold analysis and tooling design.
Civil engineering: Rapid evaluation of wind loading and airflow in tunnels or subway systems.
Electronics: Real-time thermal analysis for data centers, servers, and battery packs.

This GPU-accelerated platform is a foundational technology that advances simulation and engineering design processes.