Abstract

Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level.

Highlights

  • Coevolution is prevalent at species, organismic, and molecular levels

  • Coevolving positions in the same proteins or protein complexes are spatially coupled, as they tend to be closer than random positions in the 3-D structures of the proteins/protein complexes

  • Many coevolving positions are located at functionally important sites of the molecules

Read more

Summary

Introduction

Coevolution is prevalent at species, organismic, and molecular levels. At the molecular level, selective constraints operate on the entire system, which often require coordinated changes of its components. Coordinated changes of amino acid residues have been investigated These studies acquired one (or two) family(ies) of aligned sequences and examined covariation between aligned positions or of the entire sequences. Some of these have applied different covariation metrics including correlation coefficients [7,8], mutual information [9,10,11,12,13], and the deviance between marginal and conditional distributions [14]. In addition to direct physical interactions, distant coevolving amino acid residues are reported to be energetically coupled [14] or subject to the functional constraints of the proteins [8]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call