Abstract
The test statistics underpinning several methods for combining p-values are special cases of generalized mean p-value (GMP), including the minimum (Bonferroni procedure), harmonic mean and geometric mean. A key assumption influencing the practical performance of such methods concerns the dependence between p-values. Approaches that do not require specific knowledge of the dependence structure are practically convenient. Vovk and Wang derived significance thresholds for GMPs under the worst-case scenario of arbitrary dependence using results from Robust Risk Analysis (RRA). Here I calculate significance thresholds and closed testing procedures using Generalized Central Limit Theorem (GCLT). GCLT formally assumes independence, but enjoys a degree of robustness to dependence. The GCLT thresholds are less stringent than RRA thresholds, with the disparity increasing as the exponent of the GMP ( r) increases. I motivate a model of p-value dependence based on a Wishart-Multivariate-Gamma distribution for the underlying log-likelihood ratios. In simulations under this model, the RRA thresholds produced tests that were usually less powerful than Bonferroni, while the GCLT thresholds produced tests more powerful than Bonferroni, for all r> - ∞. Above r> - 1, the GCLT thresholds suffered pronounced false positive rates. Above r> - 1/2, standard central limit theorem applied and the GCLT thresholds no longer possessed any useful robustness to dependence. I consider the implications of these results in the context of various interpretations of GMPs, and conclude that the GCLT-based harmonic mean p-value procedure and Simes' (1986) test represent good compromises in power-robustness trade-off for combining dependent tests.
Highlights
Combining p-values is a convenient and widely used form of meta-analysis that aggregates evidence across studies or tests, e.g. 1,2
The results indicate that the power of the generalized mean p-value (GMP) to combine p-values, under relevant dependence assumptions at ε = 0.05, was better than the Bonferroni procedure for generalized central limit theorem (GCLT) thresholds, but worse than the Bonferroni procedure for robust risk analysis (RRA) thresholds
7 Conclusions Taking the generalized mean p-value of a group of tests extends a number of existing methods for combining p-values including the Bonferroni, Šidák, harmonic mean p-value and Fisher procedures7,9–12 (Figure 6)
Summary
AT&T Inc, Bedminster, Any reports and responses or comments on the article can be found at the end of the article. Keywords Combined tests, p-values, generalized means, generalized central limit theorem, robust risk analysis, harmonic mean p-value, dependent tests
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.