From the Stone Age to the digital era, materials have been the foundation of our civilization across all epochs. Today, finding new materials leads to progress in energy, medicine, and advancements in technology. This creates a future of endless possibilities, however, there are still challenges. Human-powered approaches to finding new materials have been slow, costly, unexpected, and limited to a small chemical space.
Electrolyte materials scientists have studied in the past 30 years involve less than 1,000 different molecules, and fewer than 100 molecules have been used in the latest lithium-ion batteries. The number of possible electrolyte materials untouched in the whole chemical design space, the Molecular Universe, is astronomical, and due to the combinatorial power of elements with high connectivity to each other. The estimated number ranges between 100 billion to a trillion, depending on the constraints imposed, and is comparable to the population of the stars in the observable universe (Figure 1).

Exploring such an immense number of molecules was unthinkable just a few years ago, but now, with the rapid development of AI techniques, powerful GPUs, and CUDA-based software, scientists can explore the entire Molecular Universe and precisely identify molecules that can enable battery chemistries with superior energy density, safety, and cost.
Mapping the Molecular Universe
SES AI, a company specializing in battery innovation, are using NVIDIA hardware and software to build a “map” of the Molecular Universe. The “map”, a comprehensive database of molecular structures and properties, will aid the search for emerging battery chemistries by providing scientists a way to easily navigate chemical space—for example, finding molecules with properties tailored to a specific application, but in novel structural classes that may avoid some of the problems inherent to the molecules currently in use.
Using NVIDIA ALCHEMI to accelerate machine learning and Density Functional Theory (DFT) calculations on NVIDIA H100 and A100 GPUs by over 80x, SES is able to rapidly solve structures for molecules in entire “galaxies”, and fully calculate the basic physicochemical properties of over 121 million molecules, which include the energy levels of the highest-occupied-molecular-orbitals (HOMO)/lowest-unoccupied-molecular-orbitals (LUMO), maximum and minimum electrostatic potentials (ESP min/max), molecular polarizability, and more. By combining insights from mapping the Molecular Universe with SES’s domain-adapted LLMs, and leveraging the NVIDIA NeMo Framework and NVIDIA DGX Cloud for training, SES AI reduces battery research from decades down to months.
Interactive map
While a large database is useful in its own right, millions of rows of molecular data did not, by themselves, illuminate a path for exploring the universe efficiently. No battery company in the world, SES included, has the resources to investigate manually and exhaustively hundreds of millions of molecules. Instead, SES scientists sought a way to ensure their screening efforts sampled a sufficiently broad swath of chemical space. Verifying the breadth of a screening effort required a way of understanding how all the molecules in the Molecular Universe are related to one another. Existing methods theoretically provided a means for doing this – molecular fingerprints that encode molecules as numbers can be used to position molecules relative to one another, and their high dimensionality can be reduced into a human-interpretable two-dimensions using state-of-the-art techniques like Uniform-Manifold-Approximation-and-Projection (UMAP). However, there was one major problem:- there was simply too much data for conventional CPU-based implementations of these algorithms to handle.
For example, applying UMAP to a fraction of this database (14 million molecules) requires less than four hours of computation time on CPU for each run. Compounding the issue, UMAP frequently requires parameter tuning for optimal results, meaning that each dataset potentially generates more than 100 hours of computing time (assuming a grid search of UMAP’s adjustable parameters requires 25 runs) before a satisfactory result can be obtained. For a database that is still growing and changing, devoting less than 100 hours of computation time to recalculate the reduced dimensionality coordinates for each update was simply intractable.
Fortunately, the NVIDIA cuML library provides a set of CUDA-based GPU-accelerated algorithms that include UMAP, reducing the time for each run from hours to minutes. With this faster approach, SES was able to apply and optimize UMAP for a set of 14M molecules in just a single day. The results of this effort are shown in Figure 2, which positions molecules in a two-dimensional space representative of structural similarity. “Galaxies” are evident as clusters of like molecules, which represent the different categories one must search to adequately sample a chemically diverse range of candidates.

Since cuML’s speed-up was so significant, SES was also able to rapidly expand its efforts to the screening of the anionic species, a key electrolyte salt sub-component responsible for forming desirable interphases on electrode surfaces. Once again, applying cuML’s GPU-accelerated implementation for UMAP, SES produced a structurally-sensitive map of the anion universe (Figure 3).
In both cases, SES was also able to leverage cuML’s implementation for HDBSCAN, a clustering method well-suited to complex datasets, to automatically label each molecule’s “home galaxy”. This greatly facilitates the automation of molecular search efforts, as code can use cluster labels as a hook for stratified sampling, ensuring all of the universe’s galaxies are adequately represented.

Qichao Hu, CEO of SES AI, explains, “The goal of our Molecular Universe effort is to map the properties of small molecules so that we can develop better energy storage devices—for flying cars, humanoid robots, data centers, and more. With this collaboration with NVIDIA using the latest computation hardware and software, we’ve accelerated this process from several thousand years to just a few months.”
The Molecular Universe MU-0 tool was recently released and more details can be found on the website.
Empowering researchers worldwide
NVIDIA CUDA-based libraries are empowering researchers worldwide to accelerate materials discovery:
- Use NVIDIA cuML Python library to accelerate your machine learning workflows without any API code changes required.
- Sign-up to receive notification when the NVIDIA Batched Geometry Relaxation NIM microservice is available for download.
- Build custom generative models with NVIDIA NeMo.
Acknowledgement
The SES team wants to thank the NVIDIA team for their support. In particular, we are grateful to Jenn Yonemitsu and Brian Tepera for their invaluable help.