Abstract

BackgroundThe strength of selective constraints operating on amino acid sites of proteins has a multifactorial nature. In fact, amino acid sites within proteins coevolve due to their functional and/or structural relationships. Different methods have been developed that attempt to account for the evolutionary dependencies between amino acid sites. Researchers have invested a significant effort to increase the sensitivity of such methods. However, the difficulty in disentangling functional co-dependencies from historical covariation has fuelled the scepticism over their power to detect biologically meaningful results. In addition, the biological parameters connecting linear sequence evolution to structure evolution remain elusive. For these reasons, most of the evolutionary studies aimed at identifying functional dependencies among protein domains have focused on the structural properties of proteins rather than on the information extracted from linear multiple sequence alignments (MSA). Non-parametric methods to detect coevolution have been reported to be especially susceptible to produce false positive results based on the properties of MSAs. However, no formal statistical analysis has been performed to definitively test the differential effects of these properties on the sensitivity of such methods.ResultsHere we test the effect that variations on the MSA properties have over the sensitivity of non-parametric methods to detect coevolution. We test the effect that the size of the MSA (number of sequences), mean pairwise amino acid distance per site and the strength of the coevolution signal have on the ability of non-parametric methods to detect coevolution. Our results indicate that all three factors have significant effects on the accuracy of non-parametric methods. Further, introducing statistical filters improves the sensitivity and increases the statistical power of the methods to detect functional coevolution. Statistical analysis of the physico-chemical properties of amino acid sites in the context of the protein structure reveals striking dependencies among amino acid sites. Results indicate a covariation trend in the hydrophobicities and molecular weight characteristics of amino acid sites when analysing a non-redundant set of 8000 protein structures. Using this biological information as filter in coevolutionary analyses minimises the false positive rate of these methods. Application of these filters to three different proteins with known functional domains supports the importance of using biological filters to detect coevolution.ConclusionCoevolutionary analyses using non-parametric methods have proved difficult and highly prone to provide spurious results depending on the properties of MSAs and on the strength of coevolution between amino acid sites. The application of statistical filters to the number of pairs detected as coevolving reduces significantly the number of artifactual results. Analysis of the physico-chemical properties of amino acid sites in the protein structure context reveals their structure-dependent covariation. The application of this known biological information to the analysis of covariation greatly enhances the functional coevolutionary signal and removes historical covariation. Simultaneous use of statistical and biological data is instrumental in the detection of functional amino acid sites dependencies and compensatory changes at the protein level.

Highlights

  • The strength of selective constraints operating on amino acid sites of proteins has a multifactorial nature

  • One of the most important results obtained in this analysis was that the percentage of positive values (PPV; see Material and Methods for description) increases from a maximum value of approximately 20% in a previous work where no filter was applied [7] to a maximum value of 82% when using 20 sequences, and 80% when using 50 and 100 sequences (Figure 2)

  • We tested the effect on PPV and SN of variations in the size of the multiple sequences alignment (MSAs, with the sizes ranging between 20 and 100 sequences), mean pairwise amino acid distance and strength of coevolution

Read more

Summary

Results

Filtering by the parsimony information criterion removes most of the stochastic coevolution One of the pre-requisites to consider a site as valid for our analyses was when it presented enough information as to remove the effects of the phylogenetic asymmetry on the data. One of the most important results obtained in this analysis was that the percentage of positive values (PPV; see Material and Methods for description) increases from a maximum value of approximately 20% in a previous work where no filter was applied [7] to a maximum value of 82% when using 20 sequences, and 80% when using 50 and 100 sequences (Figure 2) This important increase in PPV value suggests that amino acid distances per site are an important factor to take into account when performing this kind of approaches as previously suggested [17]. Inspection of the mean PPV values in MSAs with different sizes did not show a clear tendency when varying the number of sequences (Figure 2a to 2c) This result may be due to the fact that introducing the parsimony filter makes the MI based method more robust to variations in the MSAs sizes due to a lower effect of the stochastic and phylogenetic covariations. Non-parsimony informative coevolving sites are highlighted as example of phylogenetic coevolution

Conclusion
Background
20 Sequences
Molecular Weight
Literature
Methods
Galitsky B
15. Clarke ND
27. Yang Z
34. Moran NA
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.