Assessing Generative Model Coverage of Protein Structures with SHAPES.

Tianyu Lu,Melissa H Liu,Yilin Chen,Jinho Kim,Po-Ssu Huang

doi:10.1101/2025.01.09.632260

Tianyu Lu, Melissa H Liu + Show 3 more

Open Access

https://doi.org/10.1101/2025.01.09.632260

Copy DOI

Export

Save

Cite

Journal: bioRxiv : the preprint server for biology	Publication Date: Jan 17, 2025
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recent advances in generative modeling enable efficient sampling of protein structures, but their tendency to optimize for designability imposes a bias toward idealized structures at the expense of loops and other complex structural motifs critical for function. We introduce SHAPES (Structural and Hierarchical Assessment of Proteins with Embedding Similarity) to evaluate five state-of-the-art generative models of protein structures. Using structural embeddings across multiple structural hierarchies, ranging from local geometries to global protein architectures, we reveal substantial undersampling of the observed protein structure space by these models. We use Fréchet Protein Distance (FPD) to quantify distributional coverage. Different models are distinct in their coverage behavior across different sampling noise scales and temperatures; the frequency of TERtiary Motifs (TERMs) further supports the observations. More robust sequence design and structure prediction methods are likely crucial in guiding the development of models with improved coverage of the designable protein space.

Full Text