Data Science

Spotlight: Accelerating the Discovery of New Battery Materials with SES AI’s Molecular Universe

May 08, 2025

By Kang Xu, Dan Hannah, Yumin Zhang and Qichao Hu

Discuss (0)

AI-Generated Summary

Dislike

The Molecular Universe contains an estimated 100 billion to a trillion possible electrolyte materials, but human-powered approaches have only explored a small fraction of this space, with fewer than 1,000 molecules studied in the past 30 years.
SES AI is using NVIDIA hardware and software to build a comprehensive database of molecular structures and properties, accelerating machine learning and Density Functional Theory calculations by over 80x using NVIDIA ALCHEMI on NVIDIA H100 and A100 GPUs.
By leveraging NVIDIA CUDA-based libraries like cuML and NeMo, SES AI has reduced battery research from decades to months and created an interactive map of the Molecular Universe, enabling researchers to efficiently explore the vast chemical space and identify promising new battery chemistries.

AI-generated content may summarize information incompletely. Verify important information. Learn more

From the Stone Age to the digital era, materials have been the foundation of our civilization across all epochs. Today, finding new materials leads to progress in energy, medicine, and advancements in technology. This creates a future of endless possibilities, however, there are still challenges. Human-powered approaches to finding new materials have been slow, costly, unexpected, and limited to a small chemical space.

Electrolyte materials scientists have studied in the past 30 years involve less than 1,000 different molecules, and fewer than 100 molecules have been used in the latest lithium-ion batteries. The number of possible electrolyte materials untouched in the whole chemical design space, the Molecular Universe, is astronomical, and due to the combinatorial power of elements with high connectivity to each other. The estimated number ranges between 100 billion to a trillion, depending on the constraints imposed, and is comparable to the population of the stars in the observable universe (Figure 1).

Exploring such an immense number of molecules was unthinkable just a few years ago, but now, with the rapid development of AI techniques, powerful GPUs, and CUDA-based software, scientists can explore the entire Molecular Universe and precisely identify molecules that can enable battery chemistries with superior energy density, safety, and cost.

Mapping the Molecular Universe

SES AI, a company specializing in battery innovation, are using NVIDIA hardware and software to build a “map” of the Molecular Universe. The “map”, a comprehensive database of molecular structures and properties, will aid the search for emerging battery chemistries by providing scientists a way to easily navigate chemical space—for example, finding molecules with properties tailored to a specific application, but in novel structural classes that may avoid some of the problems inherent to the molecules currently in use.

Using NVIDIA ALCHEMI to accelerate machine learning and Density Functional Theory (DFT) calculations on NVIDIA H100 and A100 GPUs by over 80x, SES is able to rapidly solve structures for molecules in entire “galaxies”, and fully calculate the basic physicochemical properties of over 121 million molecules, which include the energy levels of the highest-occupied-molecular-orbitals (HOMO)/lowest-unoccupied-molecular-orbitals (LUMO), maximum and minimum electrostatic potentials (ESP min/max), molecular polarizability, and more. By combining insights from mapping the Molecular Universe with SES’s domain-adapted LLMs, and leveraging the NVIDIA NeMo Framework and NVIDIA DGX Cloud for training, SES AI reduces battery research from decades down to months.

Interactive map

While a large database is useful in its own right, millions of rows of molecular data did not, by themselves, illuminate a path for exploring the universe efficiently. No battery company in the world, SES included, has the resources to investigate manually and exhaustively hundreds of millions of molecules. Instead, SES scientists sought a way to ensure their screening efforts sampled a sufficiently broad swath of chemical space. Verifying the breadth of a screening effort required a way of understanding how all the molecules in the Molecular Universe are related to one another. Existing methods theoretically provided a means for doing this – molecular fingerprints that encode molecules as numbers can be used to position molecules relative to one another, and their high dimensionality can be reduced into a human-interpretable two-dimensions using state-of-the-art techniques like Uniform-Manifold-Approximation-and-Projection (UMAP). However, there was one major problem:- there was simply too much data for conventional CPU-based implementations of these algorithms to handle.

For example, applying UMAP to a fraction of this database (14 million molecules) requires less than four hours of computation time on CPU for each run. Compounding the issue, UMAP frequently requires parameter tuning for optimal results, meaning that each dataset potentially generates more than 100 hours of computing time (assuming a grid search of UMAP’s adjustable parameters requires 25 runs) before a satisfactory result can be obtained. For a database that is still growing and changing, devoting less than 100 hours of computation time to recalculate the reduced dimensionality coordinates for each update was simply intractable.

Fortunately, the NVIDIA cuML library provides a set of CUDA-based GPU-accelerated algorithms that include UMAP, reducing the time for each run from hours to minutes. With this faster approach, SES was able to apply and optimize UMAP for a set of 14M molecules in just a single day. The results of this effort are shown in Figure 2, which positions molecules in a two-dimensional space representative of structural similarity. “Galaxies” are evident as clusters of like molecules, which represent the different categories one must search to adequately sample a chemically diverse range of candidates.

Since cuML’s speed-up was so significant, SES was also able to rapidly expand its efforts to the screening of the anionic species, a key electrolyte salt sub-component responsible for forming desirable interphases on electrode surfaces. Once again, applying cuML’s GPU-accelerated implementation for UMAP, SES produced a structurally-sensitive map of the anion universe (Figure 3).

In both cases, SES was also able to leverage cuML’s implementation for HDBSCAN, a clustering method well-suited to complex datasets, to automatically label each molecule’s “home galaxy”. This greatly facilitates the automation of molecular search efforts, as code can use cluster labels as a hook for stratified sampling, ensuring all of the universe’s galaxies are adequately represented.

Qichao Hu, CEO of SES AI, explains, “The goal of our Molecular Universe effort is to map the properties of small molecules so that we can develop better energy storage devices—for flying cars, humanoid robots, data centers, and more. With this collaboration with NVIDIA using the latest computation hardware and software, we’ve accelerated this process from several thousand years to just a few months.”

The Molecular Universe MU-0 tool was recently released and more details can be found on the website.

Empowering researchers worldwide

NVIDIA CUDA-based libraries are empowering researchers worldwide to accelerate materials discovery:

Use NVIDIA cuML Python library to accelerate your machine learning workflows without any API code changes required.
Sign-up to receive notification when the NVIDIA Batched Geometry Relaxation NIM microservice is available for download.
Build custom generative models with NVIDIA NeMo.

Acknowledgement

The SES team wants to thank the NVIDIA team for their support. In particular, we are grateful to Jenn Yonemitsu and Brian Tepera for their invaluable help.

Discuss (0)

About the Authors

About Kang Xu
Kang Xu is the CTO of SES AI Corp. He has been an ARL Fellow and team leader in a defense lab. He was also elected Fellows of the Materials Research Society and Electrochemical Society. His research interest encompasses electrolyte materials, interface science, which have been recognized by multiple awards. Since joining SES AI Corp in 2023, he has been dedicated to the development of Molecular Universe, a universal tool of exploring the vast chemical space for materials design and discovery.

View all posts by Kang Xu

About Dan Hannah
Dan Hannah is an associate director at SES AI Corporation. At SES, Dan leads a research program focused on discovering new battery materials using machine learning, chemical informatics, and physics-driven simulations. Prior to joining SES, Dan spent several years as a data scientist in the cybersecurity industry. Dan holds a Ph.D. in Physical Chemistry from Northwestern University and did a postdoctoral fellowship at Berkeley National Lab, where his focus was the discovery of novel inorganic materials for energy applications.

View all posts by Dan Hannah

About Yumin Zhang
Yumin Zhang is a technical project manager at SES AI Corporation where she leads the development and deployment of computational frameworks powering the Molecular Universe platform. With a strong background in computational methods for battery materials and machine learning, she works at the intersection of science and scalable technology. Her recent work on machine learning force fields for liquid electrolyte.

View all posts by Yumin Zhang

About Qichao Hu
Qichao Hu is the founder and CEO of SES AI Corp. His interest encompasses physics, chemistry and materials science, battery and manufacturing technologies, AI. He was recognized by MIT Technology Review Innovator Under 35 TR35, Forbes 30 Under 30 and IALB 2024 Innovation Award.

View all posts by Qichao Hu