AR / VR

Predict Protein Structures and Properties with Biomolecular Large Language Models

Dec 08, 2022

By Vanessa Braunstein and Nate Bradford

Discuss (0)

AI-Generated Summary

Dislike

The NVIDIA BioNeMo service is now available for early access, providing a domain-specific framework for training and serving biomolecular large language models (LLMs) for chemistry and biology.
The service includes pretrained biomolecular LLMs, such as ESM-1nv and OpenFold, which can be accessed through a cloud API or UI Playground to predict protein structures and develop AI drug discovery workflows.
The BioNeMo service features a fully managed, browser-based interface with API endpoints for protein LLMs, accelerated 3D protein structure predictions, and interactive inference and visualization of protein structures.

AI-generated content may summarize information incompletely. Verify important information. Learn more

The NVIDIA BioNeMo service is now available for early access. At GTC Fall 2022, NVIDIA unveiled BioNeMo, a domain-specific framework and service for training and serving biomolecular large language models (LLMs) for chemistry and biology at supercomputing scale across billions of parameters.

The BioNeMo service is domain-optimized for chemical, proteomic, and genomic applications, designed to support molecular data represented in the SMILES notation for chemical structures, and FASTA for amino acid and nucleic acid sequences for proteins, DNA, and RNA.

With the BioNeMo service, scientists and researchers now have access to pretrained biomolecular LLMs through a cloud API, enabling them to predict protein structures, develop workflows, and fit downstream task models from LLM embeddings.

The BioNeMo service is a turnkey cloud solution for AI drug discovery pipelines that can be used in your browser or through API endpoints. The service API endpoints offer scientists the ability to get started quickly with AI drug discovery workflows based on large language model architectures. It also provides a UI Playground to easily and quickly try these models through an API, which can be integrated into your applications.

The BioNeMo service contains the following features:

Fully managed, browser-based service with API endpoints for protein LLMs
Accelerated OpenFold model for fast 3D protein structure predictions
ESM-1nv LLM for protein embeddings for downstream tasks
Interactive inference and visualization of protein structures through a graphic user interface (GUI)
Programmatic access to pretrained models through the API

About the models

ESM-1nv, based on Meta AI’s state-of-the-art ESM-1b, is a large language model for the evolutionary-scale modeling of proteins. It is based on the BERT architecture and trained on millions of protein sequences with a masked language modeling objective. ESM-1nv learns the patterns and dependencies between amino acids that ultimately give rise to protein structure and function.

Embeddings from ESM-1nv can be used to fit downstream task models for protein properties of interest such as subcellular location, thermostability, and protein structure. This is accomplished by training a typically much smaller model with a supervised learning objective to infer a property from ESM-1nv embeddings of protein sequences. Using embeddings from ESM-1nv typically results in far superior accuracy in the final model.

OpenFold is a faithful reproduction of DeepMind’s AlphaFold-2 model for 3D protein structure prediction from a primary amino acid sequence. This long-standing grand challenge in structural biology reached a significant milestone at CASP14, where AlphaFold-2 achieved nearly experimental accuracy for predicted structures. While AlphaFold was developed for a JAX workflow, OpenFold bases its code on PyTorch.

OpenFold in BioNeMo is also trainable, meaning variants may be created for specialized research. OpenFold achieves similar accuracy to the original model and predicts the median backbone at an accuracy of 0.96 Å RMSD95 and is up to 6x faster due to changes made in the MSA generation step. This means that drug discovery researchers get 3D protein structure predictions very quickly.

Get early access to the BioNeMo service

Apply for early access to the BioNeMo s ervice. You’ll be asked to join the NVIDIA Developer Program and fill out a short questionnaire to gain your early access.

Discuss (0)

About the Authors

About Vanessa Braunstein
Vanessa Braunstein leads healthcare and life science product marketing at NVIDIA for our Clara products in drug discovery, genomics, medical imaging, medical devices, NLP, and smart hospitals. Previously, she was in product development, business development and marketing for radiology, genomics, pharmaceutical, chemistry, and bioinformatics companies using AI. She studied molecular and cell biology, public health, and business at UC Berkeley and UCLA.

View all posts by Vanessa Braunstein

About Nate Bradford
Nate Bradford leads design and simulation content production and strategy at NVIDIA. With a background in developer tools and frameworks, he specializes in creating educational resources to enable developers leveraging synthetic data, building generative AI tools, and creating digital twin solutions with OpenUSD.

View all posts by Nate Bradford

Predict Protein Structures and Properties with Biomolecular Large Language Models

About the models

Get early access to the BioNeMo service

Tags

About the Authors

Comments