Abstract
BackgroundPower analysis becomes an inevitable step in experimental design of current biomedical research. Complex designs allowing diverse correlation structures are commonly used in RNA-Seq experiments. However, the field currently lacks statistical methods to calculate sample size and estimate power for RNA-Seq differential expression studies using such designs. To fill the gap, simulation based methods have a great advantage by providing numerical solutions, since theoretical distributions of test statistics are typically unavailable for such designs.ResultsIn this paper, we propose a novel simulation based procedure for power estimation of differential expression with the employment of generalized linear mixed effects models for correlated expression data. We also propose a new procedure for power estimation of differential expression with the use of a bivariate negative binomial distribution for paired designs. We compare the performance of both the likelihood ratio test and Wald test under a variety of simulation scenarios with the proposed procedures. The simulated distribution was used to estimate the null distribution of test statistics in order to achieve the desired false positive control and was compared to the asymptotic Chi-square distribution. In addition, we applied the procedure for paired designs to the TCGA breast cancer data set.ConclusionsIn summary, we provide a framework for power estimation of RNA-Seq differential expression under complex experimental designs. Simulation results demonstrate that both the proposed procedures properly control the false positive rate at the nominal level.
Highlights
Power analysis becomes an inevitable step in experimental design of current biomedical research
The Wald test under the bivariate negative binomial (BNB) model has lower power at 2 fold down and higher power at 2 fold up in general when compared to the likelihood ratio test (LRT) under the BNB model and the LRT under the Poisson-LMM model
The LRT under the negative binomial (NB)-LMM model has lower power at almost all parameter values when compared to the LRT under the BNB model and the LRT under the Poisson-LMM model
Summary
Power analysis becomes an inevitable step in experimental design of current biomedical research. The field currently lacks statistical methods to calculate sample size and estimate power for RNA-Seq differential expression studies using such designs. There is a need of developing statistical methods for sample size calculation and power estimation with correlated RNA-Seq data. Since the emergence of RNA-Seq data, several papers have used Poisson or negative binomial (NB) distribution to model count-based expression data [7,8,9]. These methods are based on generalized linear models with fixed effects, so they can not be directly applied to correlated expression data. Even though the BNB distribution has not yet been used for RNA-Seq data analysis, it is a great candidate for experiments using paired designs
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.