Abstract

Covariation between positions in a multiple sequence alignment may reflect structural, functional, and/or phylogenetic constraints and can be analyzed by a wide variety of methods. We explored several of these methods for their ability to identify covarying positions related to the divergence of a protein family at different hierarchical levels. Specifically, we compared seven methods on a model system composed of three nested sets of G-protein-coupled receptors (GPCRs) in which a divergence event occurred. The covariation methods analyzed were based on: χ2 test, mutual information, substitution matrices, and perturbation methods. We first analyzed the dependence of the covariation scores on residue conservation (measured by sequence entropy), and then we analyzed the networking structure of the top pairs. Two methods out of seven--OMES (Observed minus Expected Squared) and ELSC (Explicit Likelihood of Subset Covariation)--favored pairs with intermediate entropy and a networking structure with a central residue involved in several high-scoring pairs. This networking structure was observed for the three sequence sets. In each case, the central residue corresponded to a residue known to be crucial for the evolution of the GPCR family and the subfamily specificity. These central residues can be viewed as evolutionary hubs, in relation with an epistasis-based mechanism of functional divergence within a protein family.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call