Abstract

Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have been proposed for RNA-seq data, most ignore any potential heterogeneity. In this study, we implemented a simulation-based and confounder-adjusted method to provide sample size recommendations for RNA-seq differential expression analysis. The data was generated using Monte Carlo simulation, given an underlined distribution of confounding covariates and parameters for a negative binomial distribution. The relationship between the sample size with the power and parameters, such as dispersion, fold change and mean read counts, can be visualized. We demonstrate that the adjusted sample size for a desired power and type one error rate of α is usually larger when taking confounding covariates into account. More importantly, our simulation study reveals that sample size may be underestimated by existing methods if a confounding covariate exists in RNA-seq data. Consequently, this underestimate could affect the detection power for the differential expression analysis. Therefore, we introduce confounding covariates for sample size estimation for heterogeneous RNA-seq data.

Highlights

  • Sample size and power are important factors for planning a biological experiment using high-throughput sequencing technologies for differential gene expression (RNAseq)

  • We found that a large sample size is required to achieve the desired 80% detection power when the heterogeneous confounding variables exist

  • The methods described here illustrate how to estimate sample size when confounding variables are likely to exist in any complex RNA-seq experimental design

Read more

Summary

Introduction

Sample size and power are important factors for planning a biological experiment using high-throughput sequencing technologies for differential gene expression (RNAseq). Larger sample sizes typically provide a more accurate estimate of the differential gene expression with high confidence. With the rapid growth of RNA-seq studies, a number of sample size estimation methods and software tools have been proposed [1,2,3,4,5,6,7,8,9]. These methods have their limitations and assumptions. Li et al (2013) extended sample size calculation methods using a Wald test, a score test and a likelihood ratio test (LRT) based

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.