Abstract

BackgroundLow-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution.ResultsWe have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance.ConclusionWe have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.

Highlights

  • Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids

  • Identification of low-complexity regions (LCRs) in chordates In order to study the evolutionary dynamics of lowcomplexity regions (LCRs) in chordates we obtained a large set of homologous genes from five genomes: Homo sapiens, Mus musculus, Gallus gallus, Danio rerio and Ciona intestinalis, which clustered into 4,227 protein families using information from Ensembl Compara [35]

  • We identified low-complexity regions (LCRs) in the protein sequences with the program SEG (Wootton & Federhen, 1994) using optimized parameter settings for the detection of highly significant repetitive sequences

Read more

Summary

Introduction

Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Perfect single amino acid tandem repeats are the beststudied type of LCR Such tracts are easy to search for in protein sequence libraries (common size cut-offs being 4 or 5 repeat units). They are frequent in transcription factors [5,10,13] and experiments have shown that variations in the length of particular single amino acid repeat tracts, such as glutamine, proline or alanine, can result in changes in the transcriptional activity of the protein containing them [14,15,16]. Potential roles for amino acid repeats in protein evolvability and multifunctionality have been explored theoretically [12]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call