Abstract
The relationships between protein sequences, structures, and functions are determined by complex codes that scientists aim to decipher. While structures contain key information about proteins' biochemical functions, they are often experimentally difficult to obtain. In contrast, protein sequences are abundant but are a step removed from function. In this paper, we propose residue level alignment (RLA)—a self-supervised objective for aligning sequence and structure embedding spaces. By situating sequence and structure encoders within the same latent space, RLA enriches the sequence encoder with spatial information. Moreover, our framework enables us to measure the similarity between a sequence and structure by comparing their RLA embeddings. We show how RLA similarity scores can be used for binder design by selecting true binders from sets of designed binders. RLA scores are informative even when they are calculated given only the backbone structure of the binder and no binder sequence information, which simulates the information available in many early-stage binder design libraries. RLA performs similarly to benchmark methods and is orders of magnitude faster, making it a valuable new screening tool for binder design pipelines. Published by the American Physical Society 2024
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have