Jointly Embedding Protein Structures and Sequences through Residue Level Alignment

Foster Birnbaum,Saachi Jain,Aleksander Madry,Amy E Keating

doi:10.1103/prxlife.2.043013

Foster Birnbaum, Saachi Jain + Show 2 more

https://doi.org/10.1103/prxlife.2.043013

Copy DOI

Export

Save

Cite

Journal: PRX Life	Publication Date: Nov 19, 2024
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

The relationships between protein sequences, structures, and functions are determined by complex codes that scientists aim to decipher. While structures contain key information about proteins' biochemical functions, they are often experimentally difficult to obtain. In contrast, protein sequences are abundant but are a step removed from function. In this paper, we propose residue level alignment (RLA)—a self-supervised objective for aligning sequence and structure embedding spaces. By situating sequence and structure encoders within the same latent space, RLA enriches the sequence encoder with spatial information. Moreover, our framework enables us to measure the similarity between a sequence and structure by comparing their RLA embeddings. We show how RLA similarity scores can be used for binder design by selecting true binders from sets of designed binders. RLA scores are informative even when they are calculated given only the backbone structure of the binder and no binder sequence information, which simulates the information available in many early-stage binder design libraries. RLA performs similarly to benchmark methods and is orders of magnitude faster, making it a valuable new screening tool for binder design pipelines. Published by the American Physical Society 2024

Full Text