Abstract

BackgroundThe polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown.ResultsWe analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom.ConclusionThis study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution.

Highlights

  • The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs, motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species

  • The major advantages of using the NCBI mRNA database for the mapping and quantitative analysis of poly(A) sites are as follows: i) the quality of sequences is high, because researchers usually verify the sequences by repeated sequencing of the same cDNA clone or clones before submission, if there is a poly(A) tail in the sequence; ii) mRNA sequences are available for many species because of the accumulation of sequences over decades; and iii) the database contains long sequences

  • The major disadvantages of using Illumina RNA-Seq reads for this purpose are as follows: i) these reads are single-run reads without resequencing verification; ii) the sequences are short and become even shorter after the poly(A) tail has been removed; iii) these reads are more sensitive to internal priming in A-rich regions, because the first strand of cDNA synthesis uses random hexamers, including hexamer oligo dT, and a very low annealing temperature (25°C for the first strand and 16°C for the second strand synthesis [Illumina Cat # RS930-1001]); iv) some artifact poly(A) tails are present owing to the fragmentation process of mRNA/cDNA; and v) only a limited number of species with read lengths of 100 bases or more are currently available in the NCBI Sequence Read Archive (SRA) dataset

Read more

Summary

Introduction

The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. Research is required to characterize motif types, locations, and abundances relative to the whole genome, as well as the conserved base composition patterns around polyadenylation [poly(A)] sites in a large number of species of different kingdoms and subkingdoms. Many mRNAs have alternative poly(A) cleavage sites [3,12,4,13,14], but these alternative sites likely still use A[A/U]UAAA as a poly(A) signal [14] This motif is recognized by the cleavage and polyadenylation specificity factor [15]. It is known that the AAUAAA motif is less frequent in plants [24] and yeast [25] than in animals, it is not known whether AAUAAA is still the most conserved or most frequent hexanucleotide motif in the region in plants and other non-model species

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call