Abstract
BackgroundAlignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis. However, it is unclear whether these state-of-the-art RNA-seq analysis pipelines can quantify small RNAs as accurately as they do with long RNAs in the context of total RNA quantification.ResultWe comprehensively tested and compared four RNA-seq pipelines for accuracy of gene quantification and fold-change estimation. We used a novel total RNA benchmarking dataset in which small non-coding RNAs are highly represented along with other long RNAs. The four RNA-seq pipelines consisted of two commonly-used alignment-free pipelines and two variants of alignment-based pipelines. We found that all pipelines showed high accuracy for quantifying the expression of long and highly-abundant genes. However, alignment-free pipelines showed systematically poorer performance in quantifying lowly-abundant and small RNAs.ConclusionWe have shown that alignment-free and traditional alignment-based quantification methods perform similarly for common gene targets, such as protein-coding genes. However, we have identified a potential pitfall in analyzing and quantifying lowly-expressed genes and small RNAs with alignment-free pipelines, especially when these small RNAs contain biological variations.
Highlights
Alignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis
The benchmarking dataset we used here consists of thermostable group II intron reverse transcriptase (TGIRT)-seq libraries for four well-defined samples from the microarray/sequencing quality control consortium (MAQC [18, 19]), each obtained in triplicate [15]
The MAQC samples A and B represent universal human reference total RNA and human brain reference total RNA, respectively, that are mixed with corresponding External RNA Controls Consortium (ERCC) spike-in transcripts
Summary
Alignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis. RNA-seq continues to pose great computational and statistical challenges These challenges range from accurately aligning sequencing reads to accurate inference of gene expression levels [1, 2]. Read assignment is carried out by aligning sequencing reads to a reference genome, such that relative gene expression levels can be inferred by the alignments at annotated gene loci [2, 7] These alignment-based methods are conceptually simple, but the read-alignment step can be timeconsuming and computationally intensive despite recent advancements in fast read aligners [4, 8, 9]. A novel method has overcome this problem by using a thermostable group II intron reverse transcriptase (TGIRT) during RNA-seq library construction [15] This method enables more comprehensive profiling of full-length structured small non-coding RNAs (sncRNA) along with long RNAs in a single RNAseq library workflow [15,16,17]. It is possible to benchmark RNA-seq quantification tools on structured small non-coding RNAs
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.