Abstract

In scientific fields that use significance tests, statistical power is important for successful replications of significant results because it is the long-run success rate in a series of exact replication studies. For any population of significant results, there is a population of power values of the statistical tests on which conclusions are based. We give exact theoretical results showing how selection for significance affects the distribution of statistical power in a heterogeneous population of significance tests. In a set of large-scale simulation studies, we compare four methods for estimating population mean power of a set of studies selected for significance (a maximum likelihood model, extensions of p-curve and p-uniform, & z-curve). The p-uniform and p-curve methods performed well with a fixed effects size and varying sample sizes. However, when there was substantial variability in effect sizes as well as sample sizes, both methods systematically overestimate mean power. With heterogeneity in effect sizes, the maximum likelihood model produced the most accurate estimates when the distribution of effect sizes matched the assumptions of the model, but z-curve produced more accurate estimates when the assumptions of the maximum likelihood model were not met. We recommend the use of z-curve to estimate the typical power of significant results, which has implications for the replicability of significant results in psychology journals.

Highlights

  • In scientific fields that use significance tests, statistical power is important for successful replications of significant results because it is the long-run success rate in a series of exact replication studies

  • Average power can range from the criterion for a type-I error, if all significant results are false positives, to 100%, if the statistical power of original studies approaches 1

  • To claim that a finding has been replicated, a replication study should reproduce a statistically significant result, and the probability of a successful replication is a function of statistical power

Read more

Summary

Introduction

In scientific fields that use significance tests, statistical power is important for successful replications of significant results because it is the long-run success rate in a series of exact replication studies. Information about the average power of studies is useful because selection for significance increases the type-I error rate and inflates effect sizes (Ioannidis, 2008). These biases are relatively small if the original studies had high power. We would like to clarify that statistical power of a statistical test is defined as the probability of correctly rejecting the null hypothesis (Neyman & Pearson, 1933) This probability depends on the sampling error of a study and the population effect size. The traditional definition of power does not consider effect sizes of zero (false positives) because the goal of a priori power planning is to ensure that a non-zero effect can be demonstrated

Objectives
Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.