Abstract

Amino acid covariation, where the identities of amino acids at different sequence positions are correlated, is a hallmark of naturally occurring proteins. This covariation can arise from multiple factors, including selective pressures for maintaining protein structure, requirements imposed by a specific function, or from phylogenetic sampling bias. Here we employed flexible backbone computational protein design to quantify the extent to which protein structure has constrained amino acid covariation for 40 diverse protein domains. We find significant similarities between the amino acid covariation in alignments of natural protein sequences and sequences optimized for their structures by computational protein design methods. These results indicate that the structural constraints imposed by protein architecture play a dominant role in shaping amino acid covariation and that computational protein design methods can capture these effects. We also find that the similarity between natural and designed covariation is sensitive to the magnitude and mechanism of backbone flexibility used in computational protein design. Our results thus highlight the necessity of including backbone flexibility to correctly model precise details of correlated amino acid changes and give insights into the pressures underlying these correlations.

Highlights

  • Evolutionary selective pressures on protein structure and function have shaped the sequences of today’s naturally occurring proteins [1,2,3]

  • We focus on the physical basis of evolutionary pressures that act on interactions between amino acids in folded proteins, which are critical in determining protein structure and function

  • We find similar patterns of amino acid covariation in natural sequences and sequences optimized for their structures using computational protein design, demonstrating the importance of structural constraints in protein molecular evolution and providing insights into the structural mechanisms leading to covariation

Read more

Summary

Introduction

Evolutionary selective pressures on protein structure and function have shaped the sequences of today’s naturally occurring proteins [1,2,3]. It is often assumed that given a natural polypeptide backbone conformation, an accurate protein design algorithm should be able to predict sequences that are similar to the natural protein sequence This test is commonly referred to as native sequence recovery [4] and it has been used extensively to evaluate various protein design sampling methods and energy functions [6,7,8]. A more useful computational test of these approaches involves comparing designed sequences with a set of reference sequences, either naturally occurring or experimentally derived, that share the desired protein fold. This comparison can be based on sequence profile similarity, which involves quantifying the difference between the frequencies of observing each amino acid at corresponding positions in the designed and reference sequences [16,17,19]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call