Abstract

Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We further demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known direct physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We finally discuss how to distinguish physically interacting proteins from proteins that only share a common evolutionary history.

Highlights

  • The vast majority of cellular processes are carried out by interacting proteins

  • We show that in cases with shared evolutionary history but without known physical interactions, both methods work with similar accuracy, while for some physically interacting systems, Direct Coupling Analysis (DCA) and mutual information outperform phylogenetic methods

  • Global statistical models [2, 3] built from the observed sequence correlations using the maximum entropy principle [2, 4], and assuming pairwise interactions, known as Direct Coupling Analysis (DCA) [5], have been used with success to determine three-dimensional protein structures from sequences [6,7,8], to analyze mutational effects [9,10,11,12] and conformational changes [13, 14], to find residue contacts between known interaction partners [5, 15,16,17,18,19,20,21], and most recently to predict interaction partners among paralogs from sequence data [22, 23]

Read more

Summary

Introduction

The vast majority of cellular processes are carried out by interacting proteins. Functional interactions between proteins allow multi-protein complexes to properly assemble, and ensure the specificity of signal transduction pathways. The amino-acid sequences of interacting proteins are correlated, both because of evolutionary constraints arising from the need to maintain physico-chemical complementarity among contacting amino-acids, and because of shared evolutionary history. The success of DCA-based approaches at predicting protein-protein interactions [22, 23] could originate only from correlations between residues that are in direct contact in the three-dimensional protein complex structure, needing to maintain physico-chemical complementarity. The similarities in the phylogenies of interacting protein families can arise from the coevolution of residues in structural contact, and from more global shared evolutionary pressures, resulting in similar evolutionary rates [36,37,38,39,40], and from shared evolutionary history unrelated to constraints, including common timing of speciation and gene duplication events [39]. A method based on mutual information (MI) was recently shown to slightly

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call