Accelerate Protein Structure Inference Over 100x with NVIDIA RTX PRO 6000 Blackwell Server Edition

The race to understand protein structures has never been more critical. From accelerating drug discovery to preparing for future pandemics, the ability to predict how proteins fold determines the capacity to solve humanity’s most pressing biological challenges. Since the release of AlphaFold2, AI inference for determining protein structures has skyrocketed. Unoptimized tools for protein structure inference can cost organizations millions due to lost research time and prolonged compute utilization.

The new NVIDIA RTX PRO 6000 Blackwell Server Edition GPU fundamentally changes this. Despite the AlphaFold2 breakthrough, CPU-bound multiple sequence alignment (MSA) generation and inefficient GPU inference remained rate-limiting steps. Building on previous collaborative efforts, new accelerations developed by NVIDIA Digital Biology Research labs enable faster-than-ever protein structure inference using OpenFold at no accuracy cost compared to Alphafold2.

In this post, we will show how to run large-scale protein analysis using RTX PRO 6000 Blackwell Server Edition GPUs, providing unprecedented protein structure inference performance to software platforms, cloud providers, and research institutions.

Video 1. The NVIDIA RTX PRO 6000 Blackwell Server Edition GPU sets a new benchmark for protein structure inference

Why do speed and scale matter in protein structure prediction?

Protein folding sits at the intersection of the most computationally demanding workloads in computational biology. Modern drug discovery pipelines require analyzing thousands of protein structures. At the same time, enzyme engineering projects demand rapid iteration cycles to optimize biological functions, and agricultural biotech applications require screening massive protein libraries to develop climate-resilient crops.

The computational challenge can become immense: a single protein structure prediction can involve metagenomic-scale MSAs, iterative refinement steps, and ensemble calculations that typically require hours of compute time. When scaled across entire proteomes or drug target libraries, these workloads become prohibitively time-consuming on CPU-based infrastructures.

For example, in a direct comparison of multiple-sequence alignment tools, MMseqs2‑GPU completed alignments 177x faster on a single L40S than CPU-based JackHMMER on a 128-core CPU and up to 720x faster when distributed across eight NVIDIA L40S GPUs. These speedups highlight how the GPU revolution dramatically alleviates computational bottlenecks in protein bioinformatics.

How does NVIDIA enable the fastest protein structure AI available?

Building on recent releases like cuEquivariance and the Boltz-2 NIM microservice, the NVIDIA Digital Biology Research lab validated groundbreaking performance improvements for OpenFold using RTX PRO 6000 Blackwell Server Edition and NVIDIA TensorRT across industry-standard benchmarks (Figure 1).

Leveraging new instructions and TensorRT, MMseqs2-GPU, and OpenFold on RTX PRO 6000 Blackwell delivers transformational performance for protein structure prediction, executing folding over 138x faster than AlphaFold2 and approximately 2.8x faster than ColabFold, while maintaining identical TM-scores.

First, faster inference speed was enabled with MMseqs2-GPU on RTX PRO 6000 Blackwell, which runs approximately 190x faster than JackHMMER and HHBlits on a dual-socket AMD 7742 CPU. In addition, bespoke TensorRT optimizations targeting OpenFold increased its inference speed 2.3x compared to baseline OpenFold. Validated on 20 CASP14 protein targets, these benchmarks establish RTX PRO 6000 Blackwell as a breakthrough solution for end-to-end protein structure prediction.

Eliminate memory bottlenecks

In addition, the 96 GB of high-bandwidth memory (1.6 TB/s) enables RTX PRO 6000 Blackwell to fold entire protein ensembles and large MSAs, allowing the full workflow to remain GPU-resident. Its Multi-Instance GPU (MIG) functionality enables a single RTX PRO 6000 Blackwell to act like four GPUs, each powerful enough to outperform an NVIDIA L4 Tensor Core GPU. This allows multiple users or workflows to share a server without compromising speed or accuracy.

Here’s a complete example demonstrating how to leverage RTX Pro 6000’s performance for rapid protein structure prediction. The first step is deploying the OpenFold2 NIM on your local machine.

# See https://build.nvidia.com/openfold/openfold2/deploy for
# instructions to configure your docker login, NGC API Key, and
# environment for running the OpenFold NIM on your local system.

# Run this in a shell, providing the username below and your NGC API Key
$ docker login nvcr.io
Username: $oauthtoken
Password: <PASTE_API_KEY_HERE>

export NGC_API_KEY=<your personal NGC key>

# Configure local NIM cache directory so the NIM model download can be reused
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
sudo chmod 0777 -R "$LOCAL_NIM_CACHE"

# Then launch the NIM container, in this case using GPU device ID 0.
docker run -it \
    --runtime=nvidia \
    --gpus='"device=0"' \
    -p 8000:8000 \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE":/opt/nim/.cache \
    nvcr.io/nim/openfold/openfold2:latest

# It can take some time to download all model assets on the initial run.
# You can check the status using the built-in health check.  This will
# return {"status": "ready"} when the NIM endpoint is ready for inference.
curl http://localhost:8000/v1/health/ready

Once the NIM has been deployed locally, you can construct inference requests and use the local endpoint to generate protein structure predictions.

#!/usr/bin/env python3

import requests
import os
import json
from pathlib import Path

# ----------------------------
# parameters
# ----------------------------
output_file = Path("output1.json")
selected_models = [1, 2]

# SARS-CoV-2 proteome example
# Spike protein (1273 residues) — critical for vaccine development
sequence = (
"MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFAST
EKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNL
REFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVD
CALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC
FTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCY
FPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITP
CSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARS
VASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQV
KQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT
FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLD
KVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHD
GKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQK
EIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT"
)

data = {
    "sequence": sequence,
    "selected_models": [1, 2],
    "relax_prediction": False,
}
print(data)

# ---------------------------------------------------------
# Submit
# ---------------------------------------------------------
url = "http://localhost:8000/biology/openfold/openfold2/predict-structure-from-msa-and-template"
print("Making request...")
response = requests.post(url=url, json=data)

# ---------------------------------------------------------
# View response
# ---------------------------------------------------------
if response.status_code == 200:
    output_file.write_text(response.text)
    print(f"Response output to file: {output_file}")

else:
    print(f"Unexpected HTTP status: {response.status_code}")
    print(f"Response: {response.text}")

Get started accelerating protein AI workflows

Whereas AlphaFold2 once required heterogeneous high-performance computing nodes, NVIDIA accelerations for protein structure prediction—including modular components in cuEquivariance, TensorRT, and MMseqs2-GPU—on RTX PRO 6000 Blackwell, enable folding on a single server at world-class speed. This makes proteome-scale folding accessible to any lab or software platform, with the fastest time-to-prediction to date.

Whether you’re developing software platforms for drug discovery, building agricultural biotech solutions, or conducting pandemic preparedness research, the unprecedented performance of RTX PRO 6000 Blackwell will transform your computational biology workflows. The power of RTX PRO 6000 Blackwell Server Edition is available today in NVIDIA RTX PRO Servers from global system makers as well as in cloud instances from leading cloud service providers.

Ready to get started? Find a partner for NVIDIA RTX PRO 6000 Blackwell Server Edition and experience protein folding at unprecedented speed and scale.

Acknowledgments

We’d like to thank the researchers from NVIDIA, University of Oxford, and Seoul National University who contributed to this research, including Christian Dallago, Alejandro Chacon, Kieran Didi, Prashant Sohani, Fabian Berressem, Alexander Nesterovskiy, Robert Ohannessian, Mohamed Elbalkini, Jonathan Cogan, Ania Kukushkina, Anthony Costa, Arash Vahdat, Bertil Schmidt, Milot Mirdita, and Martin Steinegger.