Abstract

BackgroundPipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances.MethodsFour commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data.ResultsThe overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results.ConclusionIn conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.

Highlights

  • Transcriptomics as an area in the research field of functional genomics has always been a key player for identifying interactions and regulations of gene expression

  • The correlation of gene expression in microarray and RNA sequencing (RNA-Seq) data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88)

  • The comparisons were evaluated based on matched microarray and RNA-Seq profiles of two datasets: 1.) rectal cancer patient dataset comprising four patients with a good prognosis and six patients with a bad prognosis. 2.) Burkitt Lymphoma cell line dataset comprising three replicates of control cell line and three replicates of cell line stimulated with BAFF

Read more

Summary

Introduction

Transcriptomics as an area in the research field of functional genomics has always been a key player for identifying interactions and regulations of gene expression. Within the last ten years the generation sequencing (NGS) and especially RNA sequencing (RNA-Seq), became widely available [1] These technologies are gradually replacing microarrays, when analyzing and identifying complex mechanism in gene expression. The versatility in using RNA-Seq, like discovering novel small RNAs (smRNA), microRNA (miRNA), long-non-coding RNAs (lncRNA) or alternative splicing events [3], is a further factor for an increasing popularity of this profiling approach. Another advantage is the currently highly discussed variant calling [4] [5] [6] based on RNA-Seq data, which makes this technology even more attractive. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.