Abstract

The enduring replication crisis in many scientific disciplines casts doubt on the ability of science to estimate effect sizes accurately, and in a wider sense, to self-correct its findings and to produce reliable knowledge. We investigate the merits of a particular countermeasure—replacing null hypothesis significance testing (NHST) with Bayesian inference—in the context of the meta-analytic aggregation of effect sizes. In particular, we elaborate on the advantages of this Bayesian reform proposal under conditions of publication bias and other methodological imperfections that are typical of experimental research in the behavioral sciences. Moving to Bayesian statistics would not solve the replication crisis single-handedly. However, the move would eliminate important sources of effect size overestimation for the conditions we study.

Highlights

  • In recent years, several scientific disciplines have been facing a replication crisis: researchers fail to reproduce the results of previous experiments when copying the original experimental design

  • Numerous authors identify “classical” statistical inference based on Null Hypothesis Significance Testing (NHST) as a major cause of the replication crisis (Cohen 1994; Goodman 1999a; Ioannidis 2005; Ziliak and McCloskey 2008) and suggest statistical reforms

  • While science most likely needs a combination of these reforms to improve (e.g., Ioannidis 2005; Romero 2019), we study in this paper the case for statistical reform, and its interaction with various limitations in scientific research

Read more

Summary

Introduction

Several scientific disciplines have been facing a replication crisis: researchers fail to reproduce the results of previous experiments when copying the original experimental design. We ask whether the replicability of published research would change if we replaced the conventional NHST method by Bayesian inference To address this question, we conduct a systematic computer simulation study that investigates the self-corrective nature of science in the context of statistical inference. Since different statistical frameworks (e.g., NHST and Bayesian inference) classify the same set of experimental results in different qualitative categories, e.g., “strong evidence for the hypothesis”, “moderate evidence”, “inconclusive evidence”, etc., the dominant statistical framework will affect the form and extent of publication bias This affects, in turn, the accuracy of the meta-analytic effect size estimates and the validity of SCT*.

NHST and Bayesian inference
Model description and simulation design
Variable 1: sufficient versus limited resources
Variable 2: direction bias
Variable 3: suppressing inconclusive evidence
Results: the baseline condition
Extension 1: the probabilistic file drawer effect
Extension 2: a wider range of effect sizes
Discussion
Findings
Compliance with ethical standards
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call