Abstract
Patterns of amino acid covariation in large protein sequence alignments can inform the prediction of de novo protein structures, binding interfaces, and mutational effects. While algorithms that detect these so-called evolutionary couplings between residues have proven useful for practical applications, less is known about how and why these methods perform so well, and what insights into biological processes can be gained from their application. Evolutionary coupling algorithms are commonly benchmarked by comparison to true structural contacts derived from solved protein structures. However, the methods used to determine true structural contacts are not standardized and different definitions of structural contacts may have important consequences for interpreting the results from evolutionary coupling analyses and understanding their overall utility. Here, we show that evolutionary coupling analyses are significantly more likely to identify structural contacts between side-chain atoms than between backbone atoms. We use both simulations and empirical analyses to highlight that purely backbone-based definitions of true residue–residue contacts (i.e., based on the distance between Cα atoms) may underestimate the accuracy of evolutionary coupling algorithms by as much as 40% and that a commonly used reference point (Cβ atoms) underestimates the accuracy by 10–15%. These findings show that co-evolutionary outcomes differ according to which atoms participate in residue–residue interactions and suggest that accounting for different interaction types may lead to further improvements to contact-prediction methods.
Highlights
A long-standing problem in physical biology is to predict the structure of a protein based solely on its amino acid sequence (Anfinsen, 1973; Sadowski & Jones, 2009; Marks, Hopf & Sander, 2012)
By determining an “evolutionary coupling” score for all pairs of amino acid residues within a sequence alignment—and assuming that the highest-scoring residue–residue pairs are in close spatial proximity within the structure—the search space of computational protein folding methods can be constrained, resulting in accurate 3D-structure determination (Marks et al, 2011; Hopf et al, 2012; Ovchinnikov et al, 2017)
Structural contact definitions Putatively true interactions between amino acid residues within a given protein family are frequently derived from the distance between residues in a representative protein structure
Summary
A long-standing problem in physical biology is to predict the structure of a protein based solely on its amino acid sequence (Anfinsen, 1973; Sadowski & Jones, 2009; Marks, Hopf & Sander, 2012). Other applications have used evolutionary coupling scores to predict protein binding partners and interfaces (Burger & Van Nimwegen, 2008; Hopf et al, 2014; Ovchinnikov, Kamisetty & Baker, 2014), as well as to predict the effect of mutations on protein stability and function (Hopf et al, 2017) Many of these approaches have been further improved through the use of machine learning (Cheng & Baldi, 2007; Jones et al, 2015; Michel et al, 2017), and deep neural networks that leverage evolutionary couplings along-side numerous other protein features (Tegge et al, 2009; Di Lena, Nagata & Baldi, 2012; Xiong, Zeng & Gong, 2017; Stahl, Schneider & Brock, 2017; He et al, 2017; Wang et al, 2017; Riesselman, Ingraham & Marks, 2018; Liu et al, 2018; Wozniak et al, 2018; Jones & Kandathil, 2018; Adhikari, Hou & Cheng, 2018; Hanson et al, 2018)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.