Abstract

BackgroundGene expression analysis by RNA sequencing is now widely used in a number of applications surveying the whole transcriptomes of cells and tissues. The recent introduction of ribosomal RNA depletion protocols, such as RiboZero, has extended the view of the polyadenylated transcriptome to the poly(A)- fraction of the RNA. However, substantial amounts of intronic transcriptional activity has been reported in RiboZero protocols, raising issues regarding their potential nuclear origin and the impact on the actual sequence depth in exonic regions.ResultsUsing HEK293 human cells as source material, we assessed here the impact of the two commonly used RNA extraction methods and of the library construction protocols (rRNA depletion versus mRNA) on 1) the relative abundance of intronic reads and 2) on the estimation of gene expression values. We benchmarked the rRNA depletion-based sequencing with a specific analysis of the cytoplasmic and nuclear transcriptome fractions, suggesting that the large majority of the intronic reads correspond to unprocessed nuclear transcripts rather than to independent transcriptional units. We show that Qiagen or TRIzol extraction methods retain differentially nuclear RNA species, and that consequently, rRNA depletion-based RNA sequencing protocols are particularly sensitive to the extraction methods.ConclusionsWe could show that the combination of Trizol-based RNA extraction with rRNA depletion sequencing protocols led to the largest fraction of intronic reads, after the sequencing of the nuclear transcriptome. We discuss here the impact of the various strategies on gene expression and alternative splicing estimation measures. Further, we propose guidelines and a double selection strategy for minimizing the expression biases, without loss of information.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-675) contains supplementary material, which is available to authorized users.

Highlights

  • Gene expression analysis by Long non coding RNA (RNA) sequencing is widely used in a number of applications surveying the whole transcriptomes of cells and tissues

  • Differences in intronic read abundance are protocol dependent We carried out a comparative sequence analysis of the total, nuclear, poly(A)+ and cytoplasmic RNA fractions of HEK293 cells, extracted by either organic or non-organic methods, respectively with the purpose of investigating the influence of RNA extraction and library preparation protocols on RNA sequencing (RNA-seq) data analysis (Figure 1)

  • A clear difference was seen for small RNAs (

Read more

Summary

Introduction

Gene expression analysis by RNA sequencing is widely used in a number of applications surveying the whole transcriptomes of cells and tissues. One essential issue for interpreting these differences is to be able to distinguish the intronic reads corresponding to unspliced immature precursor mRNA (hnRNA) from those defining distinct transcriptional units, such as long non-coding RNAs [8,9,12] This is relevant since the concomitant presence of mature and immature transcripts will have a direct impact on downstream analysis of gene expression profiles. Using HEK293 human cells as source material, we set out to assess the influence of the RNA extraction methods (TRIzol versus silica gel) and of the library construction protocols (rRNA depletion versus poly(A)+ selection) on 1) the relative abundance of intronic reads and 2) on the estimation of gene expression values. Based on the data generated, we discuss the respective performances of the different protocols in detecting the non-polyadenylated and non-coding fractions of the transcriptome, and their impact for analyzing the transcriptome landscape in general

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.