Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments.

Ran Bi,Peng Liu

doi:10.1186/s12859-016-0994-9

Abstract

BackgroundRNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Such experiments are still relatively costly. As a result, RNA-seq experiments often employ a small number of replicates. Power analysis and sample size calculation are challenging in the context of differential expression analysis with RNA-seq data. One challenge is that there are no closed-form formulae to calculate power for the popularly applied tests for differential expression analysis. In addition, false discovery rate (FDR), instead of family-wise type I error rate, is controlled for the multiple testing error in RNA-seq data analysis. So far, there are very few proposals on sample size calculation for RNA-seq experiments.ResultsIn this paper, we propose a procedure for sample size calculation while controlling FDR for RNA-seq experimental design. Our procedure is based on the weighted linear model analysis facilitated by the voom method which has been shown to have competitive performance in terms of power and FDR control for RNA-seq differential expression analysis. We derive a method that approximates the average power across the differentially expressed genes, and then calculate the sample size to achieve a desired average power while controlling FDR. Simulation results demonstrate that the actual power of several popularly applied tests for differential expression is achieved and is close to the desired power for RNA-seq data with sample size calculated based on our method.ConclusionsOur proposed method provides an efficient algorithm to calculate sample size while controlling FDR for RNA-seq experimental design. We also provide an R package ssizeRNA that implements our proposed method and can be downloaded from the Comprehensive R Archive Network (http://cran.r-project.org).Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0994-9) contains supplementary material, which is available to authorized users.

Highlights

RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years
In a typical RNA-seq experiment, messenger RNA molecules are extracted from samples, fragmented, and reverse transcribed to double-stranded complementary DNA
In the ‘Results and discussion’ section, we present four simulation studies based on either negative binomial distributions or real RNA-seq dataset, and our method provide reliable sample sizes for all simulation studies

Summary

Introduction

RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Power analysis and sample size calculation are challenging in the context of differential expression analysis with RNA-seq data. One challenge is that there are no closed-form formulae to calculate power for the popularly applied tests for differential expression analysis. Compared with microarray technologies that used to be the major tool for transcriptome studies, RNA-seq technologies have several advantages including a larger dynamic range of expression levels, less noise, higher throughput, and more power to detect gene fusions, single nucleotide variants and novel transcripts. In a typical RNA-seq experiment, messenger RNA (mRNA) molecules are extracted from samples, fragmented, and reverse transcribed to double-stranded complementary DNA (cDNA).

Methods

Results

Conclusion