Abstract

The gold standard for an empirical science is the replicability of its research results. But the estimated average replicability rate of key-effects that top-tier psychology journals report falls between 36 and 39% (objective vs. subjective rate; Open Science Collaboration, 2015). So the standard mode of applying null-hypothesis significance testing (NHST) fails to adequately separate stable from random effects. Therefore, NHST does not fully convince as a statistical inference strategy. We argue that the replicability crisis is “home-made” because more sophisticated strategies can deliver results the successful replication of which is sufficiently probable. Thus, we can overcome the replicability crisis by integrating empirical results into genuine research programs. Instead of continuing to narrowly evaluate only the stability of data against random fluctuations (discovery context), such programs evaluate rival hypotheses against stable data (justification context).

Highlights

  • Empirical psychology and the social sciences at large remain in crisis today, because many key-results cannot be replicated (Baker, 2015; Open Science Collaboration, 2015; Etz and Vandekerckhove, 2016)

  • With Lakatos (1978), we consider a research program progressive as long as a theoretical construction or its core-preserving modification generate predictions that are at least partially corroborated by new data of sufficient induction quality

  • The replicability crisis in psychology is in large part a consequence of applying an unsophisticated version of null-hypothesis significance testing (NHST)

Read more

Summary

INTRODUCTION

Empirical psychology and the social sciences at large remain in crisis today, because (too) many key-results cannot be replicated (Baker, 2015; Open Science Collaboration, 2015; Etz and Vandekerckhove, 2016). ‘gauging corroboration quality’ refers to evaluating the degree to which probably replicable data support one hypothesis more than another Before, we apply these distinctions, we can define as follows: Def. induction quality: A measure of the sensitivity of an empirical set-up (given two specified point-hypotheses and a fixed sample size) that is stated as α- and β-error. To prepare for a critical discussion, consider that both meta-analyses sought to discover a non-random effect, but neither tested the psi-hypothesis in the sense of gauging L(H|D); effect sizes are heterogeneous, suggesting that uncontrolled influences are at play; Bem’s own studies report larger effects than their independent replications, suggesting a self-fulfilling prophecy; Bayes-t-tests, as we saw, depend on the prior distribution and different priors can lead to contradictory results; most studies included in these meta-analyses are individually underpowered.

Summary
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.