Abstract

BackgroundProtein sequences are subject to a mosaic of constraint. Changes to functional domains and buried residues, for example, are more apt to disrupt protein structure and function than are changes to residues participating in loops or exposed to solvent. Regions of constraint on the tertiary structure of a protein often result in loose segmentation of its primary structure into stretches of slowly- and rapidly-evolving amino acids. This clustering can be exploited, and existing methods have done so by relying on local sequence conservation as a signature of selection to help identify functionally important regions within proteins. We invert this paradigm by leveraging the regional nature of protein structure and function to both illuminate and make use of genome-wide patterns of local sequence conservation.ResultsOur hypothesis is that the regional nature of structural and functional constraints will assert a positive autocorrelation on the evolutionary rates of neighboring sites, which, in a pairwise comparison of orthologous proteins, will manifest itself as the clustering of non-synonymous changes across the amino acid sequence. We introduce a dispersion ratio statistic to test this and related hypotheses. Using genome-wide interspecific comparisons of orthologous protein pairs, we reveal a strong log-linear relationship between the degree of clustering and the intensity of constraint. We further demonstrate how this relationship varies with the evolutionary distance between the species being compared. We provide some evidence that proteins with a history of positive selection deviate from genome-wide trends.ConclusionsWe find a significant association between the evolutionary rate of a protein and the degree to which non-synonymous changes cluster along its primary sequence. We show that clustering is a non-redundant predictor of evolutionary rate, and we speculate that conflicting signals of clustering and constraint may be indicative of a historical period of relaxed selection.

Highlights

  • Protein sequences are subject to a mosaic of constraint

  • Saccharomyces data and analysis We obtained from Kellis et al [16] the protein-coding genes and ortholog assignments for four Saccharomyces species: S. cerevisiae, S. paradoxus, S. mikitae, and S. bayanus. We considered only those proteins for which all four sequences were present, and these were aligned using ClustalW and subjected to phylogenetic analysis assuming the fixed unrooted topology ((S. cerevisiae, S. paradoxus), (S. mikitae, S. bayanus))

  • The dispersion ratio as a simple measure of clustering we introduce the dispersion ratio, a measure of the degree to which non-synonymous changes are clustered in a pairwise alignment

Read more

Summary

Introduction

Protein sequences are subject to a mosaic of constraint. Changes to functional domains and buried residues, for example, are more apt to disrupt protein structure and function than are changes to residues participating in loops or exposed to solvent. Regions of constraint on the tertiary structure of a protein often result in loose segmentation of its primary structure into stretches of slowly- and rapidly-evolving amino acids This clustering can be exploited, and existing methods have done so by relying on local sequence conservation as a signature of selection to help identify functionally important regions within proteins. In a comparison of related sequences, domains may be apparent as regions of surprising similarity This style of de novo annotation is exploited routinely and underlies a number of web-accessible methods including but not limited to the Evolutionary Trace (ET) [3,4,5] and Evolution-Structure-Function analysis (ESF) [6,7]. More sophisticated de novo annotation schemes gain resolution through a combination of improved evolutionary models, accounting for site autocorrelation, and respecting spatial proximities induced by tertiary structure, e.g. [8,9,10]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call