Abstract

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.

Highlights

  • RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted

  • Our analysis reveals the significance of the proposed pipeline in gaining biological insights concerning the transcriptome

  • We will elaborate each step in detail

Read more

Summary

Results

For StringTie and IDP, the genes predicted with more introns were more likely to represent novel isoforms, which was consistent with previous studies using long reads[29, 30] (Supplementary Fig. 12). Long-read-based techniques IDP and IsoSeq predicted many novel isoforms or known reference transcripts that were not detected by any short-read-based technique (Supplementary Fig. 22). We compared the performance of the genome-alignmentbased tools, StringTie and Cufflinks (using different aligners), transcriptome-alignment-based tools, eXpress and Salmon-Aln, the alignment-free tools kallisto, Sailfish (with quasi-mapping), Salmon-SMEM, and Salmon-Quasi, and the long-read-based technique IDP (using different short-read and long-read aligners). The two alignment-free tools kallisto and Salmon-SMEM had the most consistent predictions across MCF7-100 and MCF7-300 samples among the short-read-based techniques, which was consistent with results in ref.

RMSD limma
Discussion
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call