Abstract
Summary: Next-generation sequencing platforms for measuring digital expression such as RNA-Seq are displacing traditional microarray-based methods in biological experiments. The detection of differentially expressed genes between groups of biological conditions has led to the development of numerous bioinformatics tools, but so far, few exploit the expanded dynamic range afforded by the new technologies. We present edgeRun, an R package that implements an unconditional exact test that is a more powerful version of the exact test in edgeR. This increase in power is especially pronounced for experiments with as few as two replicates per condition, for genes with low total expression and with large biological coefficient of variation. In comparison with a panel of other tools, edgeRun consistently captures functionally similar differentially expressed genes.Availability and implementation: The package is freely available under the MIT license from CRAN (http://cran.r-project.org/web/packages/edgeRun).Contact: edimont@mail.harvard.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Highlights
Generation sequencing technologies are steadily replacing microarray-based methods, for instance transcriptome capture with RNA-Seq (Mortazavi et al, 2008) and CAGE-Seq capture for the promoterome (Kanamori-Katayama et al, 2011)
Robinson et al (2010) proposed edgeR, an R package that eliminates the nuisance mean expression parameter by conditioning on a sufficient statistic for the mean, a strategy first popularized by Fisher (1925) for the binomial distribution
We propose an alternative more powerful approach which eliminates the nuisance mean parameter via maximizing the exact P-value over all possible values for the mean without conditioning which we call ‘unconditional edgeR’ or edgeRun
Summary
Generation sequencing technologies are steadily replacing microarray-based methods, for instance transcriptome capture with RNA-Seq (Mortazavi et al, 2008) and CAGE-Seq capture for the promoterome (Kanamori-Katayama et al, 2011). By the far the simplest and most popular approach reduces differential expression to a pairwise comparison of mean parameters, resulting in a fold-change measure of change and a P-value to ascertain statistical significance of the finding To address this problem, tools such as edgeR (Robinson et al, 2010), DESeq (Love et al, 2014) among many others have been developed and can be applied to any experiment in which digital count data is produced. Traditional metrics used when benchmarking methods such as the false positive rate and power are useful but limited as they are purely statistical concepts that can only be tested on simulated data They do not help in determining to what extent methods deliver truly biologically important genes. We demonstrate how even though it may be less statistically powerful than DESeq in some simulation cases, edgeRun produces results that are functionally more relevant
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have