Agentic AI / Generative AI

Designing Protein Binders Using the Generative Model Proteina-Complexa

Developing new protein-based therapies and catalysts involves the challenging task of designing protein binders, or proteins that bind to a target protein or small molecule. The search space for possible amino acid sequence permutations and resulting 3D protein structures for a designed binder is vast, and achieving strong, specific binding requires careful optimization of the interactions between the protein binder and the target. 

To address these challenges, NVIDIA has released Proteina-Complexa, a generative model that designs de novo protein binders and enzymes. 

In this post, we detail the key technologies behind Proteina-Complexa, explore primary use cases, and highlight the extensive experimental validation of generated protein binders. We also provide a step-by-step guide for using the command-line interface to generate your own binders.

Key technologies in Proteina-Complexa

Proteina-Complexa performance relies on three distinct technical components: the base generative model, the training datasets, and the integration of inference-time compute scaling.

Built on top of the La-Proteina model, Proteina-Complexa uses a partially latent flow-matching framework to generate both fully atomistic binder structures (protein backbone and side-chain) and the corresponding amino acid sequence, called co-design. In this approach, backbone alpha carbon atoms are explicitly modeled in 3D Cartesian space while all other atoms (side-chain and non-alpha-carbon) and the amino acid sequence are compressed into a learned latent space through an autoencoder. This balances atomic fidelity with computational tractability.

Historically, computational workflows have approached binder design as a fragmented process, often relying on separate models for generating the backbone and the sequence. While these modular methods can yield strong results, co-design enables reasoning at an atomistic level. By generating the amino acid sequence and the fully atomistic structure (backbone and side-chains) simultaneously, Proteina-Complexa ensures that the chemical identities and 3D geometry are tightly coupled. This integrated generation allows for the design of precise, high-affinity interfaces that are inherently optimized for folding and synthesis.

Training a generative model for protein binder design requires a large amount of structural data on binders and their targets. Proteina-Complexa was trained on over 1 million curated, high-quality experimental and predicted structures from the Protein Data Bank (PDB), AlphaFold Protein Structure Database, PLINDER, and the recently published Teddymer dataset.

The Proteina-Complexa model also introduces a new approach for designing binders, unifying a generative approach that leverages knowledge about protein binder structures with inference-time compute scaling to iteratively optimize designs during inference. During binder generation, “reasoning” search algorithms (for example, Beam Search, Best-of-N) evaluate and refine candidates at intermediate steps, investing additional compute on difficult targets while maintaining the computational efficiency of protein structure knowledge. 

This new unified approach increases the computational efficiency of the model and the quality of generated binders, measured by in silico success metrics and experimentally validated binding to the target.

Use cases for Proteina-Complexa

Proteina-Complexa use cases include protein binders for protein targets and small molecule targets, as well as enzyme design.

Protein binders for protein targets

You can use Proteina-Complexa to design de novo protein binders against disease-relevant targets across indications including oncology, immunology, and neurology. Proteina-Complexa generates binders with full atomic detail: protein backbone, side-chains, and amino acid sequence, enabling direct handoff to experimental testing without intermediate modeling steps.

This use case has been experimentally validated with collaborators from Manifold Bio, Novo Nordisk, Viva Biotech, and Duke University.

Figure 2 shows the following binders generated by Proteina-Complexa: 

  • Challenging TNF-alpha three-chain protein target (a), surface representation with generated binder in purple 
  • Claudin-1 protein target (b) in gray surface representation; zoom-in shows red interface hydrogen bonds between target and binder 
  • Small molecule target (c) in gray with generated binder in purple/gold

Protein binders for small molecule targets

You can use Proteina-Complexa to design proteins that bind to specific small molecules. Applications include targeted drug delivery, biosensors, and prodrug activation.

This use case has been experimentally validated in collaboration with the University of Cambridge.

Enzyme design

Given a specific enzyme active site, the 3D arrangement of amino acid residues responsible for catalyzing a chemical reaction, you can use Proteina-Complexa to generate structurally diverse proteins that incorporate the active site structure. This capability enables de novo enzyme design for industrial biocatalysis, environmental remediation, and synthetic biology applications.

Experimental validation

The NVIDIA team validated the de novo proteins generated by Proteina-Complexa in extensive wet lab experiments in collaboration with multiple external partners. Overall, tens of millions of initial in silico candidates were generated by Proteina-Complexa. After filtering, around 1 million binder candidates were experimentally tested against 133 distinct protein targets, ranging from well-established benchmark targets to therapeutically relevant targets without previously reported binders. 

Large-scale experiments leveraging state-of-the-art multiplexed phage screening technology were run to measure binding hit rates of all candidates against all targets, representing one of the largest binder design benchmarks to date. 

Additionally, using surface plasmon resonance and western blotting, quantitative binding kinetics were measured for selected targets of interest. The generated proteins expressed well, demonstrating high folding stability, and Proteina-Complexa was able to produce binders against most targets, including binders with nano- and picomolar affinities. For instance, Proteina-Complexa generated strong binders against the Activin Receptor Type-2A, a promising therapeutic target in disorders characterized by muscle wasting, for which no similar mini-binders have been reported in the literature.

Beyond protein targets, the team pushed the boundaries of Proteina-Complexa by designing proteins that bind to sugar molecules on the surface of red blood cells. Designing proteins to stick to sugars is a major challenge because carbohydrates are small, highly polar, and covered in a dense layer of water that typically prevents a protein from forming a stable attachment. 

While existing AI tools primarily succeed on hydrophobic (water-repelling) surfaces, our system generated 24 candidates for this difficult sugar-binding task. In laboratory assays, four of these designs showed strong agglutination signals, being more efficient at clumping red blood cells together than the natural proteins, called lectins, currently used in laboratories. 

Additional bio-layer interferometry unambiguously confirmed the direct binding of a lead candidate to the carbohydrate target. By successfully binding to these highly polar targets, Proteina-Complexa has demonstrated it can tackle complex medical targets that were previously considered nearly impossible to design for.

To learn more, see Latent Generative Search unlocks de novo Design of Untapped Biomolecular Interactions at Scale.

How to generate your own protein binders using Proteina-Complexa

The following examples use the Proteina-Complexa command line interface.

Prerequisites

  • Familiarity with Python, YAML configuration files, and basic protein structure concepts
  • Access to at least one NVIDIA A100, H100, or newer GPU

Installation and setup

Step 1: Download the code

# Clone the repository
git clone https://github.com/NVIDIA-Digital-Bio/Proteina-Complexa
cd Proteina-Complexa

Step 2: Set up the environment

Using UV package manager:

# Create a virtual environment and install packages
./env/build_uv_env.sh
source .venv/bin/activate

# Create the environment configuration file (.env) 
complexa init

Edit the environment configuration file (.env) and set the appropriate environment variable paths:

LOCAL_CODE_PATH=/path/to/Proteina-Complexa/ 
LOCAL_DATA_PATH=/path/to/Proteina-Complexa/assets 

Load the environment configuration:

# Create the shell set up script
complexa init uv

# Load the environment variables into the current session
source env.sh

Step 3: Download model checkpoints

# Download Proteina-Complexa model checkpoints
complexa download --complexa-all

# Download community model checkpoints
complexa download --all

Step 4: Validate your setup

complexa validate design 
configs/search_binder_local_pipeline.yaml

How to design a binder for a protein target

This example designs binders for PD-L1, a validated therapeutic target.

Step 1: Add the target protein, target information, and binder length

Note that this step is not required for the PD-L1 example because the target protein has already been added.

complexa target add pdl1 \
      --target-path /path/to/your/pdl1.pdb \
      --target-input A1-150 \
      --hotspot-residues A45 A67 A89 \
      --binder-length 60 120

Step 2: Verify that the target was added successfully

complexa target list
complexa target show 02_PDL1

Step 3: Run the full design pipeline: generate -> filter -> evaluate -> analyze

complexa design configs/search_binder_local_pipeline.yaml \
   ++run_name=pdl1_design \
   ++generation.task_name=02_PDL1

Step 4: Monitor the pipeline progress

complexa status

The complexa design command runs all four pipeline stages sequentially. The ++key=value syntax uses Hydra to override YAML configuration parameters at the command line.

In this case, the pipeline generates candidate binders using Proteina-Complexa, filters them by AlphaFold2 reward scores, evaluates the top candidates by redesigning sequences with ProteinMPNN and refolding with structure prediction, and outputs a summary CSV with all metrics.

You can also run each stage individually:

complexa generate configs/search_binder_local_pipeline.yaml  # Generate binder structures
complexa filter configs/search_binder_local_pipeline.yaml    # Filter by reward scores
complexa evaluate configs/search_binder_local_pipeline.yaml  # Evaluate with refolding
complexa analyze configs/search_binder_local_pipeline.yaml   # Aggregate results

How to design a binder for a small molecule target

The ligand binder workflow uses the same four-stage pipeline with a different configuration file that points to the ligand-target model checkpoint. This example designs binders for S-adenosylmethionine (SAM), a small molecule bound to CntL, an aminobutyrate transferase (PDB entry 7C7M).

Step 1: Add the small molecule target

Note that this step is not required for the SAM example because the target ligand has already been added.

complexa target add sam \
    --target-path /path/to/your/7C7M.pdb \
    --ligand SAM \
    --binder-length 100 \
    --dict configs/targets/ligand_targets_dict.yaml

Step 2: Verify that the target was added successfully

# List all ligand targets in ligand_targets_dict.yaml
complexa target list --dict configs/targets/ligand_targets_dict.yaml

# Show details for the ligand in 7C7M
complexa target show 42_7C7M_LIGAND --dict configs/targets/ligand_targets_dict.yaml

Step 3: Run the ligand binder design pipeline

complexa design configs/search_ligand_binder_local_pipeline.yaml \
   ++run_name=sam_design \
   ++generation.task_name=42_7C7M_LIGAND

The pipeline stages (generate, filter, evaluate, analyze) are identical to the protein target workflow. The only differences are the configuration file (which selects the ligand-target checkpoint) and the target specification format.

Note the following requirements:

  • Proteina-Complexa is designed to run locally on a single or multi-GPU machine, as well as on a cluster of multiple machines.
  • Both Docker and UV-based virtual environments are supported.

Get started with protein binder design 

Proteina-Complexa is a step forward in computational protein binder design, combining co-design of fully atomistic structures and sequences with inference-time compute to generate high-quality binders for protein and small molecule targets, while also enabling the precision scaffolding of enzyme active sites.

By releasing the source code, trained model checkpoints, datasets, and research papers detailing the innovations, we aim to provide a customizable foundation for researchers and developers building the next generation of protein-based therapeutics, catalysts, and biosensors. 

Ready to get started? 

  • Run inference: Generate high-quality, fully atomistic binders for your targets.
  • Train and fine-tune the model: Adapt the Proteina-Complexa model for your use cases.

Check out these resources: 

We invite you to join our collaborators from Manifold Bio, Novo Nordisk, Viva Biotech, Duke University, the University of Cambridge, LMU Munich, and the University of Bonn in exploring the capabilities of Proteina-Complexa to generate protein binders, and more.

Acknowledgments

We would like to acknowledge the following people for their support and contributions to this post: Micha Livne, Tomas Geffner, Zhonglin Cao, Guoqing Zhou, Kushal Shah, Quiara Neam, Xi Chen, Tianjing Zhang, Pia Hardy, Alejandra Rico, Emine Kucukbenli, and Arash Vahdat.

Discuss (0)

Tags