Abstract

Recent RNA-seq studies reveal that the transcriptomes in animals and plants are more complex than previously thought, leading to the inclusion of many more splice isoforms in annotated genomes. However, it is possible that a significant proportion of the transcripts are spurious isoforms that do not contribute to functional proteins. One of the current hypotheses is that commonly used mRNA extraction methods isolate both pre-mature (nuclear) mRNA and mature (cytoplasmic) mRNA, and these incompletely spliced pre-mature mRNAs may contribute to a large proportion of these spurious transcripts. To investigate this, we compared a traditional RNA-seq dataset (total RNA-seq) and a ribosome-bound RNA-seq dataset (polyribosomal RNA-seq) from Arabidopsis thaliana. An integrative framework that combined de novo assembly and genome-guided assembly was applied to reconstruct transcriptomes for the two datasets. Up to 44.8% of the de novo assembled transcripts in total RNA-seq sample were of low abundance, whereas only 0.09% in polyribosomal RNA-seq de novo assembly were of low abundance. The final round of assembly using PASA (Program to Assemble Spliced Alignments) resulted in more transcript assemblies in the total RNA-seq than those in polyribosomal sample. Comparison of alternative splicing (AS) patterns between total and polyribosomal RNA-seq showed a significant difference (G-test, p-value<0.01) in intron retention events: 46.4% of AS events in the total sample were intron retention, whereas only 23.5% showed evidence of intron retention in the polyribosomal sample. It is likely that a large proportion of retained introns in total RNA-seq result from incompletely spliced pre-mature mRNA. Overall, this study demonstrated that polyribosomal RNA-seq technology decreased the complexity and diversity of the coding transcriptome by eliminating pre-mature mRNAs, especially those of low abundance.

Highlights

  • As a result of the development of deep sequencing technology, our understanding of transcriptome complexity has been greatly improved during the past decade

  • Afterwards, deeper sequencing provided by RNA-seq technology led to the observation that up to 60% of intron-containing genes in Arabidopsis could be alternatively spliced under various conditions [3,4]

  • The polyribosomal data consisted of a total of nearly 96 million reads (101-nt read length; 5.8 GB single-end Illumina data for whole leaves) [16]

Read more

Summary

Introduction

As a result of the development of deep sequencing technology, our understanding of transcriptome complexity has been greatly improved during the past decade. The polyribosomal RNA technology is a powerful tool for interpreting post-transcriptional regulation of gene expression, the differential alternative splicing patterns between polyribosomal mRNA-seq and total mRNA-seq are far from well explored on a genome-wide basis and are the subject of this paper. We applied a combination of de novo and genome-guided assembly methods to reconstruct the transcriptomes for total RNA-seq and polyribosomal RNA-seq samples.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.