Abstract

BackgroundThe presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins. AARs were thought to be frequently involved in bio-molecular interactions. Comprehensive studies that primarily focused on metazoan AARs have suggested that AARs are evolving rapidly and are highly variable among species. However, there is still controversy over causal factors of this inter-species variation. In this work, we attempted to investigate this topic mainly by comparing AARs in orthologous proteins from ten angiosperm genomes.ResultsAngiosperm AAR content is positively correlated with the GC content of the protein coding sequence. However, based on observations from fungal AARs and insect AARs, we argue that the applicability of this kind of correlation is limited by AAR residue composition and species' life history traits. Angiosperm AARs also tend to be fast evolving and structurally disordered, supporting the results of comprehensive analyses of metazoans. The functions of conserved long AARs are summarized. Finally, we propose that the rapid mRNA decay rate, alternative splicing and tissue specificity are regulatory processes that are associated with angiosperm proteins harboring AARs.ConclusionsOur investigation suggests that GC content is a predictor of AAR content in the protein coding sequence under certain conditions. Although angiosperm AARs lack conservation and 3D structure, a fraction of the proteins that contain AARs may be functionally important and are under extensive regulation in plant cells.

Highlights

  • The presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins

  • Because short AARs may be derived from the interruption of a long AAR, we used repeated residues per 1000 amino acids (RRPK, Repeated Residues per Kilo Amino Acids, defined as the ratio of the total AAR length to the protein length, multiplied by 1000) to represent the AAR content of a protein or protein segment

  • For Arabidopsis orthologous Repeat Containing Protein (RCP) and rice orthologous RCPs, a higher average Repeated Residues per One Kilo Amino Acids (RRPK) of protein segments encoded by alternatively spliced exons was observed (Welch’s t-test, p = 7.6 × 10-6 and 4.6 × 10-11, respectively; Table S7 in Additional File 2), while we found no higher fraction of RCPs to be alternatively spliced genes in comparison with non-RCPs (23.2% vs. 24.3% and 31.3% vs.37.7%, respectively; Fisher exact test, p = 0.31 and 7.7 × 10-3, respectively)

Read more

Summary

Introduction

The presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins. As repetitive DNA is very abundant in eukaryotic genomes [1], AARs are frequently found in the proteomes of eukaryotes [2,3,4]. These simple peptides can be encoded by tandem repeats of the same codon, which are vulnerable to point mutations, or by a mixture of synonymous codons [5]. These repetitive codon tracts are primarily introduced by either replication slippage [6] or recombination [7].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call