Abstract

Statistical analysis of alignments of large numbers of protein sequences has revealed “sectors” of collectively coevolving amino acids in several protein families. Here, we show that selection acting on any functional property of a protein, represented by an additive trait, can give rise to such a sector. As an illustration of a selected trait, we consider the elastic energy of an important conformational change within an elastic network model, and we show that selection acting on this energy leads to correlations among residues. For this concrete example and more generally, we demonstrate that the main signature of functional sectors lies in the small-eigenvalue modes of the covariance matrix of the selected sequences. However, secondary signatures of these functional sectors also exist in the extensively-studied large-eigenvalue modes. Our simple, general model leads us to propose a principled method to identify functional sectors, along with the magnitudes of mutational effects, from sequence data. We further demonstrate the robustness of these functional sectors to various forms of selection, and the robustness of our approach to the identification of multiple selected traits.

Highlights

  • We calculate the random expectation of the Recovery of the mutational-effect vector ∆ by a generic other vector ν, in order to establish a null model to which to compare

  • The length-L vector ∆ of single-site mutational effects introduced in the two-state case in the main text is replaced by a (q − 1) × L matrix of mutational effects, each being denoted by ∆l(αl)

  • To quantify the performance of Inverse Covariance Off-Diagonal (ICOD) and to compare to Statistical Coupling Analysis (SCA) over a range of selection biases we focused on binary sequences

Read more

Summary

Recovery by a random vector

We calculate the random expectation of the Recovery of the mutational-effect vector ∆ by a generic other vector ν, in order to establish a null model to which to compare. L denotes the length of the sequence and the number of components of ∆ and ν. Note that we employ the usual convention that empty products are equal to one: Eq 2 yields ν1 = cos θ1. In the relevant regime L 1, an asymptotic expansion of Γ yields:. The maximum expectation of Recovery is obtained when all components of ∆, i.e. all mutational effects, are identical: Recovery max =. The average Recovery becomes minimal when only one component of ∆ is nonzero, which constitutes the limit of the case where the mutational effect at one site is dominant: Recovery min =.

Inverse covariance matrix of our sequence ensembles
Binary sequences
First-order small-coupling expansion
Application to our sector model
From a sector model to a Potts model for sequences
Selection on multiple traits
Robustness of functional sectors and of ICOD
Robustness of functional sectors to different forms of selection
Multiple states per site and alternative gauge choice
Pseudocounts
Performance of SCA
Analysis of the SCA method
Comparison between ICOD and SCA
Performance of a method based on the generalized Hopfield model
Findings
Application of ICOD to a multiple sequence alignment of PDZ domains
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call