Simulation / Modeling / Design

Build an AI Scientist for Life Science Discovery with NVIDIA BioNeMo Agent Toolkit

Jun 23, 2026

By Kyle Tretina, Mahan Salehi, Neel Patel and Kris Kersten

Discuss (0)

AI-Generated Summary

Dislike

NVIDIA BioNeMo enables AI scientists to perform biomolecular tasks by providing accelerated, agent-ready interfaces (BioNeMo Skills) that package core capabilitiessuch as structure prediction, molecular generation, docking, sequence analysis, and genomicsas callable services accessible through both hosted and local NIM deployments.
BioNeMo Skills and Model Context Protocol server wrappers document model purposes, input requirements, expected artifacts, and failure modes, allowing agents to autonomously discover, select, invoke, and interpret biomolecular models with high efficiency and reliability, independent of backend agent or runtime.
Empirical benchmarking (using Codex CLI with GPT-5.5 fast) demonstrated that integrating BioNeMo Skills doubled agents token efficiency and increased task completion rates from 57.1% to 100%, transforming agent workflows from isolated model calls into iterative, production-ready research loops.

AI-generated content may summarize information incompletely. Verify important information. Learn more

AI scientists are emerging as a new interface for scientific computing. These agents can read papers, write code, generate hypotheses, call APIs, inspect files, and iterate on results. But science isn’t software engineering. There is no test suite that turns green when a hypothesis is correct; discovery is iterative, uncertain, and grounded in the physical world. You can’t take a general coding agent, point it at biology, and expect new medicines. In biomolecular research, the ceiling of an AI scientist’s capabilities is set by the scientific tools it can use reliably, correctly, and efficiently.

A general‑purpose agent may understand whether protein folding, molecular docking, molecular generation, sequence design, multiple sequence alignment, protein backbone generation, or genome modeling is relevant to a task. It needs help knowing which AI model to call, how to format the request, which input parameters matter, what artifact to expect, and how to interpret the result.

NVIDIA BioNeMo is the platform that closes that gap for any agent. It turns the NVIDIA accelerated digital‑biology stack into tools an AI scientist can use:

An accelerated tool layer: NVIDIA NIM and BioNeMo open models deliver core biomolecular capabilities as optimized, callable services, including structure prediction, docking, molecular generation, sequence design, alignment/search, and genomics. These capabilities are accelerated by NVIDIA libraries such as cuEquivariance (structure models) and Parabricks (genomics) rather than simply running on NVIDIA hardware.
Agent-ready interfaces: Nvidia BioNeMo Skills package each capability as a documented, callable resource so an agent can choose the right tool, send a valid request, and read the result. Such tools include purpose, required inputs, optional parameters, expected artifacts, and failure modes. Model Context Protocol (MCP) server wrappers expose open models that aren’t yet packaged as NIM through the same agent‑callable pattern.

This post walks through the hands‑on process of pointing an agent at the platform, giving it a BioNeMo Skill to operate a model, choosing where that model runs, and measuring whether the agent’s loop actually improves (see Figure 1, below). Skills are the focus here because they are the most direct way to turn a model into an agent tool.

Diagram showing an AI scientist workflow that starts from a scientific goal and iterates through model selection, input preparation, model execution, output inspection, and explanation, then produces a grounded result by calling BioNeMo capabilities such as fold, dock, search, score, and simulate — Figure 1. An AI scientist iterates from a scientific goal, selecting a model, preparing inputs, running it, inspecting outputs, and explaining results with caveats; this produces a grounded result by calling BioNeMo capabilities such as fold, dock, search, score, and simulate

Prerequisites

Access to the BioNeMo Agent Toolkit Skills repository (https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit), including the skills and NIM references
An agent runtime such as Claude or Codex
An NVIDIA API key for hosted BioNeMo NIM endpoints
(Optional) a GPU node for local NIM deployment

Build an AI scientist with BioNeMo

1. Plan the scientific workflow

Begin with the workflow the AI scientist should perform. A useful biomolecular AI scientist can select a model, prepare valid inputs, run it, inspect outputs, and explain results with scientific caveats.

For example, an AI scientist might:

Generate a multiple sequence alignment with MMseqs2 (MSA Search)
Fold a peptide sequence with Boltz‑2 or OpenFold3
Generate molecules with GenMol
Dock a ligand against a protein target with DiffDock

The platform supplies the deployable model layer for each step. NIM packages biomolecular AI models, including structure prediction, molecular generation, docking, sequence analysis, design, and genomics (for example Evo 2 and Parabricks), as optimized, callable services that run through hosted endpoints or local infrastructure.

BioNeMo Skills sit on top of these services to make each capability usable by an agent, describing the model’s purpose, required inputs, optional parameters, expected artifacts, and failure modes so the AI scientist can choose the right tool, prepare a valid request, and interpret outputs such as CIF, SDF, FASTA, A3M, or SMILES files.

2. Point your agent at the platform

Start with discovery, not a single endpoint. Point the agent at https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit so it can enumerate the available capabilities and learn the platform’s structure before it acts. From there, the relevant skill—or MCP server wrapper, for open models not yet packaged as NIMs—tells the agent how to use a specific model; what it does, when to use it, how to prepare the request, and what artifact to expect.

Treat a BioNeMo Skill as an agent capability, not just an endpoint wrapper. The same prompt pattern applies to any model in the platform.

3. Choose hosted or local deployment

NIM gives teams flexible deployment options. Use hosted NIM endpoints when the AI scientist needs fast access for non-production code without managing infrastructure, GPU scheduling, container setup, model warmup, or large supporting databases. This makes hosted the best starting point for broad agent access, evaluation, occasional calls, or workflows that depend on infrastructure-heavy services such as MSA search.

Use local NIM deployment when the workflow makes repeated calls to the same model, or needs lower warm latency, data locality, or tighter runtime control. This suits iterative agent loops that generate candidates, inspect outputs, adjust parameters, and rerun many times.

A practical rule: Start hosted for ease of access and scale, then move selected models local when latency, throughput, security, or repeated iteration justify the added operational control. In internal testing on a single GPU, moving the right models local reduced warm per‑call latency for repeat‑call workloads, while one‑off calls were best served by hosted endpoints.

A skill or MCP wrapper should support either path by telling the agent where the model is available, how to call it, and what artifact to expect.

4. Use a model through a skill

Use the same prompt structure for any BioNeMo Skill. The example below uses OpenFold3, but it also applies to Boltz‑2, DiffDock, GenMol, ProteinMPNN, MSA Search, RFdiffusion, Evo 2, and other NIMs for biology.

For a hosted OpenFold3 NIM endpoint:

Use the OpenFold3 BioNeMo Skill to fold MKTVRQERLKSIVR with the NVIDIA API
endpoint at https://build.nvidia.com/openfold3

For a local OpenFold3 NIM deployment:

Use the OpenFold3 BioNeMo Skill to fold MKTVRQERLKSIVR with the local NIM
endpoint at http://localhost:8000 (or the endpoint where NIM is deployed)

Accelerated tools, not just wrappers

The platform’s value is that it’s fast and production‑ready when called. BioNeMo NIMs provide an accelerated, easily deployed microservice for many of the most commonly used models. BioNeMo NIM Skills simplify deploying these microservices, enabling agents to run locally or use hosted services. This eliminates the complexity of dependency management required to build and deploy models from source.

AI scientists work in iterative loops: generate candidates, inspect outputs, adjust parameters, rerun. BioNeMo NIMs, enabled by BioNeMo NIM Skills, improve this loop by streamlining both deployment and downstream inference tasks, enabling rapid iteration. We measure this by benchmarking the quality of the agent’s results and the efficiency of each run, comparing the agent with the skill against the same agent without it.

Another metric is the agent’s efficiency in using the tools required to build these iterative workflows. Here, we measure the token efficiency of an agent with and without skills. By factoring in correctness as shown in Figure 2, above, we can assess an agent’s overall performance by comparing the number of passing assertions (individual steps that compose the overall task) with the number of tokens required.

When using BioNeMo NIM Skills, an agent averages a 2x improvement in number of passing assertions per tokens consumed.

Evaluate accuracy with task‑level outcomes: Did the agent select the right model, prepare valid inputs, return the expected artifact, and explain the result correctly? Evaluate efficiency with single‑call latency, parameter‑sweep latency, and token use. Together these show whether the skill helps the agent produce better scientific results with less setup, fewer retries, and faster iteration (see Figure 2, above).

All metrics reported here were measured using Codex CLI with GPT-5.5 fast. All BioNeMo NIM skills are designed to be agent agnostic, so similar agent performance can be expected with other backends and models.

Troubleshooting

If a predicted structure appears low‑confidence, check whether the sequence, MSA, templates, or constraints are biologically appropriate.
If docking or binding results look implausible, check the biological setup before trusting the pose or score.
If generated molecules or protein designs look promising, filter them with downstream scientific criteria before advancing.
If a NIM auto‑selects an incompatible optimized profile for your GPU, set the model profile explicitly rather than relying on auto‑selection.
Endpoints at the url build.nvidia.com are for small‑scale development and testing only, not production‑grade inference.

Going further

BioNeMo turns the NVIDIA accelerated biomolecular stack into callable, discoverable tools that any agent can use to do real biology. The accelerated model layer (NIM and open models, accelerated by libraries such as cuEquivariance and Parabricks) supplies the capability; BioNeMo Skills and MCP wrappers teach an agent how to use each model correctly; and a single repository lets an agent discover the whole platform on day zero.

For teams building complete agents, the broader platform, including NVIDIA Nemotron and the NVIDIA NeMo Agent Toolkit for orchestration and memory, extends the same approach beyond single tool calls.

The workflow starts with the scientific task, then maps each step to the appropriate model, interface, and deployment path. Begin with hosted NVIDIA NIM endpoints for broad access and ease of use, then move selected models local when latency, throughput, security, or repeated iteration requires more control. This turns biomolecular AI from isolated model calls into an iterative research loop.

Getting started

With BioNeMo, an AI scientist can use structure prediction, molecular generation, docking, sequence analysis, design, and genomics as callable tools, moving from prompt to hypothesis, from hypothesis to model call, and from model output to the next scientific decision.

Point your agent at https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit and hand it a BioNeMo Skill to get started.

Discuss (0)

About the Authors

About Kyle Tretina
Kyle Tretina is a product marketing leader at NVIDIA, focused on advancing AI for digital biology and drug discovery. He drives the strategy and storytelling behind BioNeMo and our work with BioPharma, shaping how next-generation foundation models and GPU-accelerated microservices transform molecular and protein design. With a PhD in molecular microbiology and immunology, Kyle bridges science and strategy, translating breakthroughs in AI, chemistry, and biology into platforms that accelerate discovery for researchers, startups, and pharmaceutical companies worldwide.

View all posts by Kyle Tretina

About Mahan Salehi
Mahan Salehi is a product management leader at NVIDIA, where he drives the strategy and development of the company’s generative AI software portfolio. He has served as product owner for several NVIDIA flagship products, including Triton Inference Server, NIM, and NeMo microservices. Mahan also spearheads product strategy for NVIDIA BioNeMo, advancing the use of AI foundation models in life sciences and biology. Before joining NVIDIA, Mahan was the CEO and co-founder of an AI startup dedicated to improving mental health treatment. He holds an engineering degree from the University of Toronto.

View all posts by Mahan Salehi

About Neel Patel
Neel Patel is a drug discovery scientist at NVIDIA, focusing on cheminformatics and computational structural biology. Before joining NVIDIA, Neel was a computational chemist in big pharma, where he worked on structure-based drug design. He holds a Ph.D. from the University of Southern California. He lives in San Diego with his family and enjoys hiking and traveling.

View all posts by Neel Patel

About Kris Kersten
Kris Kersten is a technical marketing engineer at NVIDIA focused on AI, working to scale ML and DL solutions to solve today's most pressing problems in Healthcare. Prior to NVIDIA, Kris worked at Cray Supercomputers studying hardware and software performance characteristics from low-level cache benchmarking to large-scale parallel simulation.

View all posts by Kris Kersten