Abstract

BackgroundBefore conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (π1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes.ResultsA sample size estimate based on the common formulation, to achieve the desired sensitivity on average, can be calculated using a univariate method without taking the correlation among genes into consideration. This formulation of sample size problem is inadequate because the probability of detecting the specified sensitivity can be lower than 50%. On the other hand, the needed sample size calculated by the proposed permutation method will ensure detecting at least the desired sensitivity with 95% probability. The method is shown to perform well for a real example dataset using a small pilot dataset with 4-6 samples per group.ConclusionsWe recommend that the sample size problem should be formulated to detect a specified proportion of differentially expressed genes with 95% probability. This formulation ensures finding the desired proportion of true positives with high probability. The proposed permutation method takes the correlation structure and effect size heterogeneity into consideration and works well using only a small pilot dataset.

Highlights

  • Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes

  • Many sample size estimation methods have been developed for various Type I error specifications, such as family-wise error rate (FWE) [1,2,3], false discovery rate (FDR) [4,5,6,7,8], and the number of false positives [7,9]

  • The Type I error rate was based on setting the FDR at q* = 0.05

Read more

Summary

Introduction

Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays (replicates) required in order to have adequate power to identify differentially expressed genes. The sample size for a microarray study is commonly calculated as the number of arrays needed to achieve the specified power on average (e.g., [3,4,5,6,9,10]). With the sample size estimate that is calculated to achieve a specified sensitivity on average, the proportion of truly differentially expressed genes detected would frequently be less than the average. Wang and Chen [2], Tsai et al

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.