Abstract

A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.

Highlights

  • High-throughput sequencing technology is rapidly becoming the standard method for measuring RNA expression levels [1]

  • Our analysis focused on a number of measures that are most relevant for detection of differential gene expression from RNA-seq data: i) normalization of count data; ii) sensitivity and specificity of DE detection; iii) performance on the subset of genes that are expressed in one condition but have no detectable expression in the other condition and, iv) the effects of reduced sequencing depth and number of replicates on the detection of differential expression

  • Differential expression analysis We evaluated the ability of the various methods to detect differentially expressed genes using both the External RNA Control Consortium (ERCC) and TaqMan data

Read more

Summary

Introduction

High-throughput sequencing technology is rapidly becoming the standard method for measuring RNA expression levels (aka RNA-seq) [1]. One of the main goals of these experiments is to identify the differentially expressed genes in two or more conditions. Such genes are selected based on a combination of expression change threshold and score cutoff, which are usually based on P values generated by statistical modeling. The expression level of each RNA unit is measured by the number of sequenced fragments that map to the transcript, which is expected to correlate directly with its abundance level. This measure is fundamentally different from gene probe-based methods, such as microarrays. In RNA-seq the expression signal of a transcript is limited by the sequencing depth and is dependent on the expression levels of other transcripts, whereas in array-based methods probe intensities are independent

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.