Abstract

Quantifying evidence is an inherent aim of empirical science, yet the customary statistical methods in psychology do not communicate the degree to which the collected data serve as evidence for the tested hypothesis. In order to estimate the distribution of the strength of evidence that individual significant results offer in psychology, we calculated Bayes factors (BF) for 287,424 findings of 35,515 articles published in 293 psychological journals between 1985 and 2016. Overall, 55% of all analyzed results were found to provide BF > 10 (often labeled as strong evidence) for the alternative hypothesis, while more than half of the remaining results do not pass the level of BF = 3 (labeled as anecdotal evidence). The results estimate that at least 82% of all published psychological articles contain one or more significant results that do not provide BF > 10 for the hypothesis. We conclude that due to the threshold of acceptance having been set too low for psychological findings, a substantial proportion of the published results have weak evidential support.

Highlights

  • The reliability of evidence published in psychological science recently became a major concern as a high rate of published experiments failed to generate significant results upon replication [1], including some classic textbook studies [2]

  • While most often this low replicability of statistically significant findings is attributed to publication bias [3], questionable research practices [4] and flawed statistical analyses [5], replicability is conditional on the degree to which significant results in themselves constitute evidence for an experimental hypothesis

  • We provide a general overview of the evidential value of significant results in psychological journals

Read more

Summary

Introduction

The reliability of evidence published in psychological science recently became a major concern as a high rate of published experiments failed to generate significant results upon replication [1], including some classic textbook studies [2] While most often this low replicability of statistically significant findings is attributed to publication bias [3], questionable research practices [4] and flawed statistical analyses [5], replicability is conditional on the degree to which significant results in themselves constitute evidence for an experimental hypothesis. In order to quantify their evidence, researchers in psychology and other social sciences use their data to distinguish true effects from random chance They overwhelmingly use null-hypothesis significance testing (NHST), which estimates the probability of receiving an observation (or more extreme ones) if the null-hypothesis of no effect is true.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call