Simulation / Modeling / Design

Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit

Apr 14, 2026

By Erica Tsai, Justin S. Smith, Wen Jie Ong, Dallas Foster and Kelvin Lee

For decades, computational chemistry has faced a tug-of-war between accuracy and speed. Ab initio methods like density functional theory (DFT) provide high fidelity but are computationally expensive, limiting researchers to systems of a few hundred atoms. Conversely, classical force fields are fast but often lack the chemical accuracy required for complex bond-breaking or transition-state analysis.

Machine learning interatomic potentials (MLIPs) have emerged as the bridge, offering quantum accuracy at classical speeds. However, the software ecosystem is a new bottleneck. While the MLIP models themselves run on GPUs, the surrounding simulation infrastructure often relies on legacy CPU-centric code.

NVIDIA ALCHEMI (AI Lab for Chemistry and Materials Innovation) helps to address these challenges by accelerating chemicals and materials discovery with AI. We have previously announced two components of the ALCHEMI portfolio:

ALCHEMI NIM microservices: Scalable, cloud‑ready microservices for AI-accelerated batched atomistic simulations in chemistry and materials science
ALCHEMI Toolkit-Ops: A set of foundational GPU kernels designed to accelerate the calculations behind simulations, such as neighbor lists, dispersion corrections, and electrostatics

Today, we are introducing the NVIDIA ALCHEMI Toolkit, a collection of GPU-accelerated simulation building blocks that incorporates and expands on ALCHEMI Toolkit-Ops. ALCHEMI Toolkit is designed to manage the data flow between accelerated chemistry and materials domain-specific kernels and deep learning models. ALCHEMI Toolkit extends beyond individual models and kernels to provide a modular, PyTorch-native structure for researchers and developers to compose custom simulation workflows.

Figure 1 shows the ALCHEMI architectural stack and product features supported in this initial release of ALCHEMI Toolkit, including expanded functionality in Toolkit-Ops. This release includes capabilities for geometry relaxation and molecular dynamics, and the supporting pipeline infrastructure for combining multiple simulation workflows.

How does ALCHEMI Toolkit advance digital chemistry?

ALCHEMI Toolkit is not just a collection of scripts. It’s designed to enable researchers and developers to build custom, performant atomistic simulation workflows with ease.

Expanding ALCHEMI Toolkit-Ops

ALCHEMI Toolkit leverages the capabilities of Toolkit-Ops to handle the underlying calculations of the simulations. The previous release included several key operations:

Neighbor list constructions
DFT-D3 dispersion corrections
Long-range electrostatic interactions

This release broadens the scope of common operations addressed to include:

Batched dynamics kernels
JAX support (for v0.2.0 release features)

Integration with the atomistic simulation ecosystem

ALCHEMI Toolkit is designed to integrate seamlessly with the broader atomistic simulation ecosystem. We’re excited to announce the following integrations with leading platforms in the chemistry and materials science community.

Orbital

Orbital develops advanced AI foundation models used to accelerate the discovery of novel cooling systems for data centers and sustainable materials. Orbital has integrated ALCHEMI Toolkit into their new OrbMolv2 model to drastically reduce the time required for inference. The new model will leverage ALCHEMI Toolkit components such as PME electrostatics for periodic Coulomb interactions and the MTK integrator for batched constant-pressure molecular dynamics. The existing Orb models already leverage Toolkit-Ops for GPU-accelerated graph construction, providing a ~1.7x acceleration for large systems and ~33x for batched smaller systems with TorchSim support.

Materials Graph Library (MatGL)

MatGL is an open source framework for state-of-the-art graph-based MLIPs. ALCHEMI Toolkit is integrating with the MatGL TensorNet model to significantly accelerate materials simulations and property predictions workflows. By leveraging ALCHEMI Toolkit GPU-native kernels and batching infrastructure, MatGL users can achieve higher computational efficiency and lower memory consumption for simulations at scale.

Matlantis

Matlantis enables rapid materials discovery by combining universal MLIPs with high-performance cloud computing. Matlantis is actively exploring the ALCHEMI Toolkit and identifying where its composable dynamics can deliver the greatest value for industrial materials simulation customers. This builds on its proven integration of ALCHEMI Toolkit-Ops—including Warp-optimized neighbor list construction and DFT-D3 dispersion corrections—which significantly reduces computational overhead of atomistic interactions with speedups of up to 10x.

Furthermore, by evaluating specific components within ALCHEMI Toolkit, this collaboration has the potential to enable Matlantis to move beyond single-structure optimization to high-throughput, parallel relaxation of millions of molecular configurations. Ultimately, this integration aims to further power small-scale research and industrial-scale materials design, accelerating chemical evaluation with unparalleled GPU efficiency.

How to get started with ALCHEMI Toolkit

This section walks you through how to get started with ALCHEMI Toolkit, which is straightforward and designed for ease of use.

System and package requirements

Python ≥3.11, <3.14
PyTorch ≥2.8
CUDA Toolkit 12+, NVIDIA driver 470.57.02+
Operating System: Linux (primary), macOS
NVIDIA GPU (RTX 20xx or newer), CUDA Compute Capability ≥ 7.0
Minimum 4 GB RAM (16GB recommended for large systems)

Installation

Use the following code to install ALCHEMI Toolkit:

# Install Atomic Simulation Environment (ASE, used in the examples below)
uv pip install ase

# Using pip
pip install nvalchemi-toolkit

# Using uv
uv venv --seed --python 3.12
uv pip install nvalchemi-toolkit

# Install from source
git clone https://github.com/NVIDIA/nvalchemi-toolkit.git 
cd nvalchemi-toolkit
uv sync --all-extras

# Add nvalchemi as a project dependency
uv add nvalchemi-toolkit

For more information, reference the NVIDIA/nvalchemi-toolkit GitHub repo and the ALCHEMI Toolkit documentation.

Key features of ALCHEMI Toolkit for building end-to-end workflows

This section dives into four core ALCHEMI Toolkit features: customizable batched simulation workflows, build-your-own dynamics classes, model wrappers, and advanced data management. These features provide researchers and developers with the tools and flexibility needed to create bespoke end-to-end workflows that maximize efficiency and performance on NVIDIA GPUs.

Customizable batched simulation workflows

The distinctive feature of the NVIDIA ALCHEMI Toolkit is the GPU-native batched dynamics engine. No single MLIP model is perfect for every chemical environment, especially when dealing with nonlocal, long-range interactions.

ALCHEMI Toolkit enables researchers to combine modular chemistry and materials science domain-specific kernels and models into customized simulation workflows. This architecture supports the development of specialized compute workflows and running virtual laboratories with millions of concurrent atomic interactions without the latency of traditional software stacks.

Capabilities

Composable calculators combining MLIPs with physics-based corrections
High-performance wrappers (MACE, TensorNet, AIMNet2)

API example

The following example constructs the data, sets up the MLIP, and configures a FIRE2 geometry optimization that is then used as a starting point for velocity Verlet (microcanonical) dynamics:

from ase import Atoms
from nvalchemi.data import AtomicData, Batch
from nvalchemi.dynamics import ConvergenceHook
from nvalchemi.dynamics.optimizers import FIRE2
from nvalchemi.dynamics.integrator import VelocityVerlet

# setup some batch of atomic structures
atomic_data = [AtomicData.from_atoms(Atoms(...), device="cuda") for _ in range(16)]
batch = Batch.from_data_list(atomic_data)

# setup your MLIP and dynamics classes
mlip = ...
# optimizer convergence depends on the force norm and max values
conv_criteria = ConvergenceHook(
    criteria=[
        {"key": "forces", "threshold": 0.05, "reduce_op": "norm"},
        {"key": "forces", "threshold": 0.1, "reduce_op": "max"}
    ]
)
optimizer = FIRE2(
    mlip,
    convergence_hook=conv_criteria,
    n_steps=200
)
velverlet = VelocityVerlet(mlip, n_steps=1000)

You can run and scale the simulation pipelines in one of two ways: on a single GPU or on across multiple CPUs and GPUs.

Run and scale the pipeline on a single GPU: The FusedStage class is formed by “adding” two or more dynamics objects together. This enables wrapping the end-to-end workflow in torch.compile and sharing CUDA stream contexts.

fused = optimizer + velverlet
# context manager handles compilation and CUDA stream
with fused:
    # runs 200 steps of optimization and 1000 steps of MD
    fused.run(batch)

With this approach, you can easily build simulation workflows that run sequential steps as samples within the batch converge immediately, and make optimal use of your GPU.

Run and scale the pipeline across multiple CPUs and GPUs: The second approach is to distribute the pipeline across multiple CPUs/GPUs. Using the pipe operator on two dynamics classes will then distribute the FIRE2 optimization onto one GPU, and the velocity Verlet integration on another.

pipeline = optimizer | velverlet
# equivalent to manual allocation with explicit producer/consumer
# optimizer.next_rank = 1, velverlet.prior_rank = 0
# DistributedPipeline({0: optimizer, 1: velverlet})
with pipeline:
    pipeline.run(batch)

While this example is deliberately simplified for illustrative purposes, such abstraction allows users to scale their pipeline up to multiple GPUs on a node, and out to multiple nodes to arbitrarily large datasets and number of ranks.

The following example configures eight GPUs to run geometry optimization, which pipelines the results to run Langevin dynamics on another eight GPUs:

from torch import distributed as dist
from torch.utils.data.distributed import DistributedSampler
from nvalchemi.data.datapipes import Dataset, DataLoader

# set up distributed; torchrun --nproc-per-node 8 --nnodes 2 ...
dist.initialize_process_group()
# set up data and distributed sampler
dataset = Dataset(...)
data_sampler = DistributedSampler(
   dataset,
   num_replicas=dist.get_world_size(),
   rank=dist.get_rank()
)
loader = DataLoader(
   dataset,
   batch_size=128,
   sampler=sampler,
   use_stream=True
)
# configure your pipeline; 8 ranks do optimization, 8 do langevin dynamics
optimizers = [FIRE2(mlip, ..., next_rank=index + 8) for index in range(8)]
dynamics = [Langevin(mlip, ..., prior_rank=index) for index in range(8)]
pipeline = DistributedPipeline(
    {index: stage for index, stage in enumerate(optimizers + dynamics)}
)
with pipeline:
    for batch in loader:
        pipeline.run(batch)

Build-your-own dynamics classes

ALCHEMI Toolkit offers a modular architecture to build and customize dynamics classes from the ground up. This approach enables the community to integrate new sampling methods or thermodynamic ensembles into the ALCHEMI environment while maintaining direct access to underlying kernels. This transforms dynamics into a fully customizable environment where users can construct specialized dynamics classes from scratch.

Capabilities

Specialized GPU-first trajectory analysis tools
Integrated and customizable dynamics kernels (Velocity Verlet, NPT, Langevin thermostats)
FIRE and FIRE2 optimizers

API example

from enum import Enum

import torch

from nvalchemi.data import Batch
from nvalchemi.dynamics.base import BaseDynamics, DynamicsStage
from nvalchemi.hooks import Hook, HookContext


class MySimulatedAnnealer(Hook):
       def __init__(
           self,
           t_start: float,
           t_end: float,
           cooldown_steps: int,
           frequency: int,
           stage: DynamicsStage
       ) -> None:
           # this hook will fire off every `frequency` MD steps,
           # bringing the temperature from `t_start` to `t_end`
           self.frequency = frequency
           self.t_start = t_start
           self.t_end = t_end
           self.cooldown_steps = cooldown_steps
           self.stage = DynamicsStage.BEFORE_STEP
           self.decay = (t_end / t_start) ** (1.0 / cooldown_steps)
           self._current_temp = t_start

       def __call__(self, ctx: HookContext, stage: Enum) -> None:
           # access the calling dynamics class through `HookContext`
           dynamics = ctx.workflow
           dynamics.target_temperature = max(
               dynamics.target_temperature * self.decay,
               self.t_end
           )


class VelocityVerlet(BaseDynamics)
	__needs_keys__: {"energies", "forces", "masses", "velocities"}
	__provides_keys__: {"positions"}

       def __init__(
           self,
           model: BaseModelMixin,
           n_steps: int,
           dt: float = 1.0,  # timestep
           target_temperature: float = 300.0,  # initial temperature
           tau: float = 10.0,  # coupling constant
           hooks: list[Hook] | None = None,
           convergence_hook: ConvergenceHook | dict | None = None,
           **kwargs,
       ):
           super().__init__(model=model, n_steps=n_steps, hooks=hooks, convergence_hook=convergence_hook)
           self.dt = dt
           self.target_temperature = target_temperature
           self.tau = tau
           self._prev_accelerations = None
       
       def pre_update(self, batch: Batch) -> None:
           # perform the first half of velocity Verlet
           with torch.no_grad():
                accelerations = batch.forces / batch.masses
                self._prev_accelerations = accelerations.clone()
                batch.positions.add_(
                    batch.velocities * dt + 0.5 * accelerations * dt**2.0
                )

       def post_update(self, batch: Batch) -> None:
           # perform second half of velocity Verlet, with thermostat
           # temperature update
           with torch.no_grad():
               new_accelerations = batch.forces / batch.masses
               batch.velocities.add_(0.5 * (self._prev_accelerations + new_accelerations) * self.dt)
               ke_per_atom = 0.5 * batch.masses * (batch.velocities**2).sum(dim=-1, keepdim=True)
               # get the total kinetic energy per system
               total_ke = scatter_add_(...)
               current_temp = 2.0 * total_ke / (batch.num_atoms * 3.0)
               ratio = self.target_temperature / current_temp
               lam = torch.sqrt(
                  torch.tensor(1.0 + (self.dt / self.tau) * (ratio - 1.0))
               ).clamp(min=0.8, max=1.2)  # clamp for stability
               batch.velocities.mul_(lam)

# configure the new dynamics class
my_velverlet = VelocityVerlet(
    ...,
    hooks=[
        MySimulatedAnnealer(t_start=900.0, t_end=300.0, cooldown_steps=10, frequency=100, stage=DynamicsStage.BEFORE_STEP)
    ],
)

Model wrappers

With ALCHEMI Toolkit, you can use your own pretrained models with accelerated physics components. It provides the essential infrastructure for importing your own models into the pipeline, ensuring that proprietary or domain-specific architectures can leverage GPU-native orchestration. This abstracts the complexity of different model types, providing a standardized path to move from a standalone model to a production-ready, high-throughput simulation.

Capabilities

MLIP support (MACE, TensorNet, AIMNet2)
Composable calculators
Standardized model configuration

API example

from beartype import beartype
from super_mlip import BestMLIPModel

from nvalchemi._typing import ModelOutputs
from nvalchemi.models.base import BaseModelMixin, ModelConfig, NeighborConfig


class BestMLIPWrapper(nn.Module, BaseModelMixin):
    def __init__(self, model: BestMLIPModel, **kwargs):
          super().__init__(**kwargs)
          # ModelConfig declares model capabilities (which are frozen)
          # and runtime control (mutable) for the rest of the framework
          self.model_config = ModelConfig(
	                 outputs=frozenset({"energy", "forces", "hessians"}),
	                 # this is actually the default value
	                 required_inputs=frozenset({"positions", "atomic_numbers"})
	                 autograd_outputs=frozenset({"forces"}),
	                 neighbor_config=NeighborConfig(cutoff=5.0, format="coo")
)

    def adapt_input(self, data: Batch, **kwargs) -> dict[str, Any]:
        # adapts the nvalchemi data structure to what is
        # expected by the model
        model_inputs = super().adapt_input(data, **kwargs)
        # dict structure expected by BestMLIPModel
        model_inputs["atom_numbers"] = data.atomic_numbers
        model_inputs["coords"] = data.positions
        return model_inputs

   def adapt_output(self, model_output: any, data: Batch) -> ModelOutputs:
       # adapt the model outputs from the model's forward pass to
       # format expected by nvalchemi
       output = super().adapt_output(model_output, data)
       energies = model_output["energies"]
       output["energies"] = energies
       # check model config for expected outputs
       if "forces" in self.model_config.active_outputs:
           output["forces"] = model_output["forces"]
       return output

    # beartype decorator is optional, but will runtime type check arguments
    @beartype
    def forward(self, data: Batch, **kwargs) -> ModelOutputs:
        model_inputs = self.adapt_input(data, **kwargs)
        # calls BestMLIPModel's forward definition based on MRO
        model_outputs = super().forward(**model_inputs)
        return self.adapt_output(model_outputs, data)

Advanced data management

Traditionally, the “memory tax” of moving data between the CPU and GPU is a significant bottleneck in AI-driven discovery. ALCHEMI Toolkit acts as the specialized orchestrator for scientific data, providing the infrastructure required to build custom ingestion pipelines to move information from standard research files into optimized GPU tensors.

This supports discovery to scale, making industrial-scale simulations accessible through familiar interfaces. By standardizing how atomic information is represented and loaded, ALCHEMI Toolkit ensures that data remains resident on the device, meaning the entire simulation stays on the GPU, enabling batched simulations for optimization of GPU utilization and eliminating communication overhead.

Capabilities

High-performance data loaders
ASE and Pymatgen interface
AtomicData and batch objects

API example

from nvalchemi import AtomicData, Batch
from nvalchemi import data
from ase.build import slab
atoms = slab(...)

# Create AtomicData object from ase.Atoms object
data = AtomicData.from_atoms(atoms, device="cuda")
>>> data ...

data.node_properties
data.system_properties

# Create a Batch object from a list of AtomicData
batch = Batch.from_data_list([data, data, data])
batch.num_graphs
batch.get_data(0)
# get first three samples
batch[:2]
batch[mask]
batch["energies"] -> ...

batch.from_atoms([ase.Atoms,...])

# Create a dataset from ase.Atoms
writer = data.AtomicDataZarrWriter("atom_dataset.zarr")
# writer will amortize overhead by writing batches of data;
# this is equivalent to writing individual samples but efficiently
writer.write(batch)

# Read the data from zarr
reader = data.AtomicDataZarrReader("atom_dataset.zarr")
# Dataset treats device natively; individual samples # are placed on GPU and it accelerates preprocessing transforms;
# num_workers sets the number of threads used for async prefetching
dataset = data.Dataset(reader, device = "cuda", num_workers=4)
dataloader = data.DataLoader(dataset, batch_size=16)
for batch in dataloader:
	# do something with batch

Get started building molecular workflows with ALCHEMI Toolkit

ALCHEMI Toolkit provides researchers and developers with the low-level primitives and high-level abstractions needed to build end-to-end, GPU-native molecular workflows. Moving critical bottlenecks—such as neighbor list construction, structural relaxation, and integration steps—into the PyTorch ecosystem eliminates the host-to-device memory transfer overhead that has traditionally throttled MLIP-driven simulations.

Whether you’re composing hybrid ML or physics potentials or scaling batched molecular dynamics, ALCHEMI Toolkit exposes the necessary API hooks to manage complex tensorized states without sacrificing performance.

To accelerate your chemistry and materials science simulations and explore building your own custom workflows, visit the NVIDIA/nvalchemi-toolkit GitHub repo and ALCHEMI Toolkit documentation. As we continue to expand the library of supported operations and architectures, we encourage you to clone the repository, explore the provided Jupyter notebooks, and begin integrating these GPU-accelerated workflows into your own discovery pipelines.

Acknowledgments

We’d like to thank James Gin, Tim Duignan, Vaidas Šimkus of Orbital; Professor Shyue Ping Ong of MatGL; Susumu Ohno, Ryuhei Okuno, Jethro Tan of Matlantis for working with us to adopt NVIDIA ALCHEMI Toolkit into their platforms. We would also like to thank Nikita Fedik, Roman Zubatyuk, Atul Thakur, and Logan Ward for their contributions to this post.

Discuss (0)

About the Authors

About Erica Tsai
Erica Tsai is the senior product marketing manager for NVIDIA ALCHEMI. An organic chemist by training, she holds an undergraduate degree from Princeton University and a PhD from the Massachusetts Institute of Technology. Prior to joining NVIDIA, Erica spearheaded Scientific AI initiatives at McKinsey & Company. Today, she focuses on bridging the gap between deep science, technical execution, and business value.

View all posts by Erica Tsai

About Justin S. Smith
Justin S. Smith is the senior developer relations manager for AI in Chemistry and Materials Science at NVIDIA. He is a computational chemist who earned his PhD from the University of Florida in 2018 where he worked on AI for atomistic simulation. He then went on to become a staff scientist at Los Alamos National Laboratory where he focused on ML applications to reactive chemistry and materials science.

View all posts by Justin S. Smith

About Wen Jie Ong
Wen Jie Ong is the principal product manager for NVIDIA ALCHEMI. He is an organic and polymer chemist by training, and received his PhD at MIT where he discovered a new class of dynamic covalent chemistry. Before NVIDIA, he was an Associate Partner at McKinsey, where he advised chemicals and technology companies on growth and innovation topics.

View all posts by Wen Jie Ong

About Dallas Foster
Dallas Foster is a senior deep learning software engineer for HPC and AI at NVIDIA. He received his PhD in mathematics at Oregon State University and has worked at Los Alamos National Laboratory, the National Center for Atmospheric Research, and MIT. As a member of the PhysicsNeMo team at NVIDIA, he has a particular focus on the application and deployment of deep learning for weather forecasting and molecular dynamics.

View all posts by Dallas Foster

About Kelvin Lee
Kelvin Lee is a senior deep learning software engineer at NVIDIA. He received his PhD in Physical Chemistry at the University of New South Wales, Australia. Prior to joining NVIDIA, Kelvin held academic research positions at the Center for Astrophysics | Harvard & Smithsonian and the Massachusetts Institute of Technology before working in industry as a research scientist at Intel Labs. His work focuses on accelerating computation and developer productivity for chemistry and materials science, spectroscopy, and astrophysics.

View all posts by Kelvin Lee