Abstract
BackgroundSample size calculation is an important issue in the experimental design of biomedical research. For RNA-seq experiments, the sample size calculation method based on the Poisson model has been proposed; however, when there are biological replicates, RNA-seq data could exhibit variation significantly greater than the mean (i.e. over-dispersion). The Poisson model cannot appropriately model the over-dispersion, and in such cases, the negative binomial model has been used as a natural extension of the Poisson model. Because the field currently lacks a sample size calculation method based on the negative binomial model for assessing differential expression analysis of RNA-seq data, we propose a method to calculate the sample size.ResultsWe propose a sample size calculation method based on the exact test for assessing differential expression analysis of RNA-seq data.ConclusionsThe proposed sample size calculation method is straightforward and not computationally intensive. Simulation studies to evaluate the performance of the proposed sample size method are presented; the results indicate our method works well, with achievement of desired power.
Highlights
Sample size calculation is an important issue in the experimental design of biomedical research
One of the principal questions in designing an RNAseq experiment is: What is the optimal number of biological replicates to achieve desired statistical power? (Note: In this article, the term “sample size” is used to refer to the number of biological replicates or number of subjects.) Because RNA-seq data are counts, the Poisson distribution has been widely used to model the number of reads obtained for each gene to identify differential gene expression [8,13]
Based on the negative binomial model, [14,15] proposed a quantileadjusted conditional maximum likelihood procedure to create a pseudocount which lead to the development of an exact test for assessing the differential expression analysis of RNA-seq data
Summary
Sample size calculation is an important issue in the experimental design of biomedical research. For RNA-seq experiments, the sample size calculation method based on the Poisson model has been proposed; when there are biological replicates, RNA-seq data could exhibit variation significantly greater than the mean (i.e. over-dispersion). Unlike the microarray chip, which offers only quantification of gene expression level, RNA-seq provides expression level data as well as differentially spliced variants, gene fusion, and mutation profile data. Such advantages have gradually elevated RNA-seq as the technology of choice among researchers. Based on the negative binomial model, [14,15] proposed a quantileadjusted conditional maximum likelihood procedure to create a pseudocount which lead to the development of an exact test for assessing the differential expression analysis of RNA-seq data. [16] provided a Bioconductor package, edgeR, based on the exact test
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.