Simulation / Modeling / Design

Build an AI Scientist for Life Science Discovery with NVIDIA BioNeMo Agent Toolkit

AI scientists are emerging as a new interface for scientific computing. These agents can read papers, write code, generate hypotheses, call APIs, inspect files, and iterate on results. But science isn’t software engineering. There is no test suite that turns green when a hypothesis is correct; discovery is iterative, uncertain, and grounded in the physical world. You can’t take a general coding agent, point it at biology, and expect new medicines. In biomolecular research, the ceiling of an AI scientist’s capabilities is set by the scientific tools it can use reliably, correctly, and efficiently.

A general‑purpose agent may understand whether protein folding, molecular docking, molecular generation, sequence design, multiple sequence alignment, protein backbone generation, or genome modeling is relevant to a task. It needs help knowing which AI model to call, how to format the request, which input parameters matter, what artifact to expect, and how to interpret the result.

NVIDIA BioNeMo is the platform that closes that gap for any agent. It turns the NVIDIA accelerated digital‑biology stack into tools an AI scientist can use:

  • An accelerated tool layer: NVIDIA NIM and BioNeMo open models deliver core biomolecular capabilities as optimized, callable services, including structure prediction, docking, molecular generation, sequence design, alignment/search, and genomics. These capabilities are accelerated by NVIDIA libraries such as cuEquivariance (structure models) and Parabricks (genomics) rather than simply running on NVIDIA hardware.
  • Agent-ready interfaces: Nvidia BioNeMo Skills package each capability as a documented, callable resource so an agent can choose the right tool, send a valid request, and read the result. Such tools include purpose, required inputs, optional parameters, expected artifacts, and failure modes. Model Context Protocol (MCP) server wrappers expose open models that aren’t yet packaged as NIM through the same agent‑callable pattern.

This post walks through the hands‑on process of pointing an agent at the platform, giving it a BioNeMo Skill to operate a model, choosing where that model runs, and measuring whether the agent’s loop actually improves (see Figure 1, below). Skills are the focus here because they are the most direct way to turn a model into an agent tool.

Diagram showing an AI scientist workflow that starts from a scientific goal and iterates through model selection, input preparation, model execution, output inspection, and explanation, then produces a grounded result by calling BioNeMo capabilities such as fold, dock, search, score, and simulate
Figure 1. An AI scientist iterates from a scientific goal, selecting a model, preparing inputs, running it, inspecting outputs, and explaining results with caveats; this produces a grounded result by calling BioNeMo capabilities such as fold, dock, search, score, and simulate

Prerequisites

  • Access to the BioNeMo Agent Toolkit Skills repository (https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit), including the skills and NIM references
  • An agent runtime such as Claude or Codex
  • An NVIDIA API key for hosted BioNeMo NIM endpoints
  • (Optional) a GPU node for local NIM deployment

Build an AI scientist with BioNeMo

1. Plan the scientific workflow

Begin with the workflow the AI scientist should perform. A useful biomolecular AI scientist can select a model, prepare valid inputs, run it, inspect outputs, and explain results with scientific caveats. 

For example, an AI scientist might:

  • Generate a multiple sequence alignment with MMseqs2 (MSA Search)
  • Fold a peptide sequence with Boltz‑2 or OpenFold3
  • Generate molecules with GenMol
  • Dock a ligand against a protein target with DiffDock

The platform supplies the deployable model layer for each step. NIM packages biomolecular AI models, including structure prediction, molecular generation, docking, sequence analysis, design, and genomics (for example Evo 2 and Parabricks), as optimized, callable services that run through hosted endpoints or local infrastructure. 

BioNeMo Skills sit on top of these services to make each capability usable by an agent, describing the model’s purpose, required inputs, optional parameters, expected artifacts, and failure modes so the AI scientist can choose the right tool, prepare a valid request, and interpret outputs such as CIF, SDF, FASTA, A3M, or SMILES files.

2. Point your agent at the platform

Start with discovery, not a single endpoint. Point the agent at https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit so it can enumerate the available capabilities and learn the platform’s structure before it acts. From there, the relevant skill—or MCP server wrapper, for open models not yet packaged as NIMs—tells the agent how to use a specific model; what it does, when to use it, how to prepare the request, and what artifact to expect.

Treat a BioNeMo Skill as an agent capability, not just an endpoint wrapper. The same prompt pattern applies to any model in the platform.

3. Choose hosted or local deployment

NIM gives teams flexible deployment options. Use hosted NIM endpoints when the AI scientist needs fast access for non-production code without managing infrastructure, GPU scheduling, container setup, model warmup, or large supporting databases. This makes hosted the best starting point for broad agent access, evaluation, occasional calls, or workflows that depend on infrastructure-heavy services such as MSA search.

Use local NIM deployment when the workflow makes repeated calls to the same model, or needs lower warm latency, data locality, or tighter runtime control. This suits iterative agent loops that generate candidates, inspect outputs, adjust parameters, and rerun many times.

A practical rule: Start hosted for ease of access and scale, then move selected models local when latency, throughput, security, or repeated iteration justify the added operational control. In internal testing on a single GPU, moving the right models local reduced warm per‑call latency for repeat‑call workloads, while one‑off calls were best served by hosted endpoints.

A skill or MCP wrapper should support either path by telling the agent where the model is available, how to call it, and what artifact to expect.

4. Use a model through a skill

Use the same prompt structure for any BioNeMo Skill. The example below uses OpenFold3, but it also applies to Boltz‑2, DiffDock, GenMol, ProteinMPNN, MSA Search, RFdiffusion, Evo 2, and other NIMs for biology.

For a hosted OpenFold3 NIM endpoint:

Use the OpenFold3 BioNeMo Skill to fold MKTVRQERLKSIVR with the NVIDIA API
endpoint at https://build.nvidia.com/openfold3

For a local OpenFold3 NIM deployment:

Use the OpenFold3 BioNeMo Skill to fold MKTVRQERLKSIVR with the local NIM
endpoint at http://localhost:8000 (or the endpoint where NIM is deployed)

Accelerated tools, not just wrappers

The platform’s value is that it’s fast and production‑ready when called. BioNeMo NIMs provide an accelerated, easily deployed microservice for many of the most commonly used models. BioNeMo NIM Skills simplify deploying these microservices, enabling agents to run locally or use hosted services. This eliminates the complexity of dependency management required to build and deploy models from source.  

AI scientists work in iterative loops: generate candidates, inspect outputs, adjust parameters, rerun. BioNeMo NIMs, enabled by BioNeMo NIM Skills, improve this loop by streamlining both deployment and downstream inference tasks, enabling rapid iteration. We measure this by benchmarking the quality of the agent’s results and the efficiency of each run, comparing the agent with the skill against the same agent without it.

Another metric is the agent’s efficiency in using the tools required to build these iterative workflows. Here, we measure the token efficiency of an agent with and without skills. By factoring in correctness as shown in Figure 2, above, we can assess an agent’s overall performance by comparing the number of passing assertions (individual steps that compose the overall task) with the number of tokens required. 

When using BioNeMo NIM Skills, an agent averages a 2x improvement in number of passing assertions per tokens consumed.

Evaluate accuracy with task‑level outcomes: Did the agent select the right model, prepare valid inputs, return the expected artifact, and explain the result correctly? Evaluate efficiency with single‑call latency, parameter‑sweep latency, and token use. Together these show whether the skill helps the agent produce better scientific results with less setup, fewer retries, and faster iteration (see Figure 2, above).

All metrics reported here were measured using Codex CLI with GPT-5.5 fast. All BioNeMo NIM skills are designed to be agent agnostic, so similar agent performance can be expected with other backends and models.

Troubleshooting

  • If a predicted structure appears low‑confidence, check whether the sequence, MSA, templates, or constraints are biologically appropriate.
  • If docking or binding results look implausible, check the biological setup before trusting the pose or score.
  • If generated molecules or protein designs look promising, filter them with downstream scientific criteria before advancing.
  • If a NIM auto‑selects an incompatible optimized profile for your GPU, set the model profile explicitly rather than relying on auto‑selection.
  • Endpoints at the url build.nvidia.com are for small‑scale development and testing only, not production‑grade inference.

Going further

BioNeMo turns the NVIDIA accelerated biomolecular stack into callable, discoverable tools that any agent can use to do real biology. The accelerated model layer (NIM and open models, accelerated by libraries such as cuEquivariance and Parabricks) supplies the capability; BioNeMo Skills and MCP wrappers teach an agent how to use each model correctly; and a single repository lets an agent discover the whole platform on day zero. 

For teams building complete agents, the broader platform, including NVIDIA Nemotron and the NVIDIA NeMo Agent Toolkit for orchestration and memory, extends the same approach beyond single tool calls.

The workflow starts with the scientific task, then maps each step to the appropriate model, interface, and deployment path. Begin with hosted NVIDIA NIM endpoints for broad access and ease of use, then move selected models local when latency, throughput, security, or repeated iteration requires more control. This turns biomolecular AI from isolated model calls into an iterative research loop. 

Getting started 

With BioNeMo, an AI scientist can use structure prediction, molecular generation, docking, sequence analysis, design, and genomics as callable tools, moving from prompt to hypothesis, from hypothesis to model call, and from model output to the next scientific decision. 

Point your agent at https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit and hand it a BioNeMo Skill to get started.

Discuss (0)

Tags