Rescuing true protein binders from AI hallucinations via zero-shot, ensemble-driven statistical physics scoring

bioRxiv (Preprint) 18 May 2026 2 min read

AI Insight

Deep learning-based protein design tools generate many non-functional "hallucinated" structures that current scoring methods cannot reliably filter out. The authors developed Sipobe-PPA, an affinity ranking framework that treats protein-protein interfaces as pseudo-ligands and scores them using a statistical physics force field originally trained on small-molecule interactions, thereby avoiding data leakage issues common to models trained on protein complex data. By combining this zero-shot evaluator with conformational ensembles from multiple AlphaFold3 predictions, Sipobe-PPA achieved an 80% hit rate in its top 5 predictions across de novo protein and antibody design datasets, vastly outperforming physical baselines such as Rosetta-dG (0% hit rate under the same conditions).

Why it matters

This framework could significantly reduce the cost and time of experimental validation in protein engineering and therapeutic antibody or nanobody development by more reliably identifying which computationally designed binders are worth testing in the laboratory.

Confidence

5/10Preprint — not yet peer-reviewedBiology

⚠️ Preprint – Noch nicht peer-reviewed

Dieser Artikel wurde noch nicht von unabhängigen Experten begutachtet. Die Ergebnisse sind vorläufig und sollten mit Vorsicht interpretiert werden.

The advancement of deep generative models has facilitated de novo protein and antibody design, yet translation to experimental success is hindered by a high generation rate of structural decoys. Current affinity predictors and standard structural confidence metrics fail to reliably distinguish these AI hallucinations from true binders. Here, we present Sipobe-PPA, an affinity ranking framework that conceptualizes interacting protein interfaces as pseudo-ligands, evaluating them through an AI-driven statistical physics forcefield. Because this forcefield is trained exclusively on small-molecule interactions, Sipobe-PPA acts as a zero-shot physical evaluator for protein-protein interfaces, preventing the framework against the data leakage and memorization pitfalls that affect models trained directly on protein complex datasets. To capture the structural plasticity of binding interactions, Sipobe-PPA employs a conformational ensemble strategy, computing interaction scores across multiple AlphaFold3(AF3)-predicted structural states. Benchmarking on decoy-rich de novo datasets-including Bindcraft, Boltzgen, and the Germinal antibody dataset-demonstrates the significant improvement offered by this approach. In a real-world pipeline scenario simulating wet-lab constraints (pre-filtered by AF3 ipTM > 0.8 and pLDDT > 80), Sipobe-PPA achieved an 80% Hit Rate within its Top 5 predictions across the combined dataset, compared to 0% for physical baselines like Rosetta-dG. Notably, our structural ensemble averaging outperformed single-structure scoring, highlighting the necessity of modeling prediction diversity. By maximizing top-tier hit rates across diverse nanobody and de novo targets, Sipobe-PPA provides a scalable screening paradigm that bridges the gap between computational generation and wet-lab viability.

Source: Rescuing true protein binders from AI hallucinations via zero-shot, ensemble-driven statistical physics scoring