Biology

Protein Circuit Tracing via Cross-layer Transcoders

AI Insight

ProtoMech is a new computational framework that uses cross-layer transcoders to map the internal circuits of protein language models (pLMs), specifically applied to ESM2. By learning sparse latent representations jointly across model layers rather than treating each layer in isolation, it recovers 82-89% of original model performance on protein classification and function prediction tasks. The framework identifies compressed circuits using less than 1% of the latent space that correspond to biologically meaningful structural and functional motifs, such as binding sites, signaling regions, and stability determinants, and demonstrates that steering along these circuits can guide protein design with improved fitness scores in over 70% of tested cases.


Understanding the internal computations of protein language models could accelerate rational protein engineering and drug design by providing interpretable, mechanistic insights rather than treating these models as black boxes. This work bridges mechanistic interpretability research and practical biotechnology applications.


arXiv:2602.12026v2 Announce Type: replace-cross
Abstract: Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model’s full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.

Source: Protein Circuit Tracing via Cross-layer Transcoders