Data Science

Spotlight: Accelerating the Discovery of New Battery Materials with SES AI’s Molecular Universe

From the Stone Age to the digital era, materials have been the foundation of our civilization across all epochs. Today, finding new materials leads to progress in energy, medicine, and advancements in technology. This creates a future of endless possibilities, however, there are still challenges. Human-powered approaches to finding new materials have been slow, costly, unexpected, and limited to a small chemical space. 

Electrolyte materials scientists have studied in the past 30 years involve less than 1,000 different molecules, and fewer than 100 molecules have been used in the latest lithium-ion batteries. The number of possible electrolyte materials untouched in the whole chemical design space, the Molecular Universe, is astronomical, and due to the combinatorial power of elements with high connectivity to each other. The estimated number ranges between 100 billion to a trillion, depending on the constraints imposed, and is comparable to the population of the stars in the observable universe (Figure 1).

The Molecular Universe contains an astronomical number of molecules that could be used as new materials, but humans have only touched an infinitesimal fraction of it. The electrolyte components (solvents, additives and salts) only occupy an infinitesimal fraction (less than a thousand) of the entire Molecular Universe (between one hundred billion to one trillion, depending on the constraints imposed by heavy atom numbers and elemental diversity).
Figure 1. The Molecular Universe contains an astronomical number of molecules that could be used as new materials, but humans have only touched an infinitesimal fraction of it.

Exploring such an immense number of molecules was unthinkable just a few years ago, but now, with the rapid development of AI techniques, powerful GPUs, and CUDA-based software, scientists can explore the entire Molecular Universe and precisely identify molecules that can enable battery chemistries with superior energy density, safety, and cost.

Mapping the Molecular Universe

SES AI, a company specializing in battery innovation, are using NVIDIA hardware and software to build a “map” of the Molecular Universe. The “map”, a comprehensive database of molecular structures and properties, will aid the search for emerging battery chemistries by providing scientists a way to easily navigate chemical space—for example, finding molecules with properties tailored to a specific application, but in novel structural classes that may avoid some of the problems inherent to the molecules currently in use.

Using NVIDIA ALCHEMI to accelerate machine learning and Density Functional Theory (DFT) calculations on NVIDIA H100 and A100 GPUs by over 80x, SES is able to rapidly solve structures for molecules in entire “galaxies”, and fully calculate the basic physicochemical properties of over 121 million molecules, which include the energy levels of the highest-occupied-molecular-orbitals (HOMO)/lowest-unoccupied-molecular-orbitals (LUMO), maximum and minimum electrostatic potentials (ESP min/max), molecular polarizability, and more. By combining insights from mapping the Molecular Universe with SES’s domain-adapted LLMs, and leveraging the NVIDIA NeMo Framework and NVIDIA DGX Cloud for training, SES AI reduces battery research from decades down to months.

Interactive map 

While a large database is useful in its own right, millions of rows of molecular data did not, by themselves, illuminate a path for exploring the universe efficiently. No battery company in the world, SES included, has the resources to investigate manually and exhaustively hundreds of millions of molecules. Instead, SES scientists sought a way to ensure their screening efforts sampled a sufficiently broad swath of chemical space. Verifying the breadth of a screening effort required a way of understanding how all the molecules in the Molecular Universe are related to one another. Existing methods theoretically provided a means for doing this – molecular fingerprints that encode molecules as numbers can be used to position molecules relative to one another, and their high dimensionality can be reduced into a human-interpretable two-dimensions using state-of-the-art techniques like Uniform-Manifold-Approximation-and-Projection (UMAP). However, there was one major problem:- there was simply too much data for conventional CPU-based implementations of these algorithms to handle. 

For example, applying UMAP to a fraction of this database (14 million molecules) requires less than four hours of computation time on CPU for each run. Compounding the issue, UMAP frequently requires parameter tuning for optimal results, meaning that each dataset potentially generates more than 100 hours of computing time (assuming a grid search of UMAP’s adjustable parameters requires 25 runs) before a satisfactory result can be obtained. For a database that is still growing and changing, devoting less than 100 hours of computation time to recalculate the reduced dimensionality coordinates for each update was simply intractable.

Fortunately, the NVIDIA cuML library provides a set of CUDA-based GPU-accelerated algorithms that include UMAP, reducing the time for each run from hours to minutes. With this faster approach, SES was able to apply and optimize UMAP for a set of 14M molecules in just a single day. The results of this effort are shown in Figure 2, which positions molecules in a two-dimensional space representative of structural similarity. “Galaxies” are evident as clusters of like molecules, which represent the different categories one must search to adequately sample a chemically diverse range of candidates. 

4 million molecules (shown as purple to light orange pixels) are plotted in UMAP, overlain by those molecules that passed the “Electrolyte Constraints” (ca. 14,000, shown as yellow dots), which could serve as potential electrolyte solvents or additives, and those reported as electrolyte materials (ca. 400, shown as green dots) in the open literature during the past decade (2015~2024). The popular electrolyte solvent molecules EC, PC, DMC, DME and DEE are marked as references.
Figure 2. A subset of the Molecular Universe (14 million molecules, shown as purple to light orange pixels) plotted in UMAP, overlain by those molecules that passed the “Electrolyte Constraints” (ca. 14,000, shown as yellow dots) and those reported as electrolyte materials (ca. 400, shown as green dots) in the open literature during the past decade (2015~2024). The popular electrolyte solvent molecules EC, PC, DMC, DME and DEE are marked as references.

Since cuML’s speed-up was so significant, SES was also able to rapidly expand its efforts to the screening of the anionic species, a key electrolyte salt sub-component responsible for forming desirable interphases on electrode surfaces. Once again, applying cuML’s GPU-accelerated implementation for UMAP, SES produced a structurally-sensitive map of the anion universe (Figure 3). 

In both cases, SES was also able to leverage cuML’s implementation for HDBSCAN, a clustering method well-suited to complex datasets, to automatically label each molecule’s “home galaxy”. This greatly facilitates the automation of molecular search efforts, as code can use cluster labels as a hook for stratified sampling, ensuring all of the universe’s galaxies are adequately represented.

o make the Molecular Map complete for battery electrolyte materials, a sub-universe for anions (which constitute part of the new salt materials) is also under construction. As the UMAP here indicates, an even smaller portion of the anionic universe has been explored by human scientists thus far.
Figure 3. The anionic subset of the Molecular Universe: 50 K anions from 1 billion anion structural database that passed the “Electrolyte Constraints” (shown as light blue pixels) plotted in UMAP, overlain by those known anions commonly used in Li-ion or Li-metal batteries (PF6, BF4, OTf, FSI, TFSI and Beti, as marked).

Qichao Hu, CEO of SES AI, explains, “The goal of our Molecular Universe effort is to map the properties of small molecules so that we can develop better energy storage devices—for flying cars, humanoid robots, data centers, and more. With this collaboration with NVIDIA using the latest computation hardware and software, we’ve accelerated this process from several thousand years to just a few months.”

The Molecular Universe MU-0 tool was recently released and more details can be found on the website.

Empowering researchers worldwide

NVIDIA CUDA-based libraries are empowering researchers worldwide to accelerate materials discovery:

  • Use NVIDIA cuML Python library to accelerate your machine learning workflows without any API code changes required.
  • Sign-up to receive notification when the NVIDIA Batched Geometry Relaxation NIM microservice is available for download.
  • Build custom generative models with NVIDIA NeMo.

Acknowledgement

The SES team wants to thank the NVIDIA team for their support. In particular, we are grateful to Jenn Yonemitsu and Brian Tepera for their invaluable help.

Discuss (0)

Tags