Abstract

BackgroundThe replication crisis hit the medical sciences about a decade ago, but today still most of the flaws inherent in null hypothesis significance testing (NHST) have not been solved. While the drawbacks of p-values have been detailed in endless venues, for clinical research, only a few attractive alternatives have been proposed to replace p-values and NHST. Bayesian methods are one of them, and they are gaining increasing attention in medical research, as some of their advantages include the description of model parameters in terms of probability, as well as the incorporation of prior information in contrast to the frequentist framework. While Bayesian methods are not the only remedy to the situation, there is an increasing agreement that they are an essential way to avoid common misconceptions and false interpretation of study results. The requirements necessary for applying Bayesian statistics have transitioned from detailed programming knowledge into simple point-and-click programs like JASP. Still, the multitude of Bayesian significance and effect measures which contrast the gold standard of significance in medical research, the p-value, causes a lack of agreement on which measure to report.MethodsTherefore, in this paper, we conduct an extensive simulation study to compare common Bayesian significance and effect measures which can be obtained from a posterior distribution. In it, we analyse the behaviour of these measures for one of the most important statistical procedures in medical research and in particular clinical trials, the two-sample Student’s (and Welch’s) t-test.ResultsThe results show that some measures cannot state evidence for both the null and the alternative. While the different indices behave similarly regarding increasing sample size and noise, the prior modelling influences the obtained results and extreme priors allow for cherry-picking similar to p-hacking in the frequentist paradigm. The indices behave quite differently regarding their ability to control the type I error rates and regarding their ability to detect an existing effect.ConclusionBased on the results, two of the commonly used indices can be recommended for more widespread use in clinical and biomedical research, as they improve the type I error control compared to the classic two-sample t-test and enjoy multiple other desirable properties.

Highlights

  • The replication crisis hit the medical sciences about a decade ago, but today still most of the flaws inherent in null hypothesis significance testing (NHST) have not been solved

  • This paper studied the behaviour of common Bayesian significance and effect size indices for the setting of twosample Welch’s t-test, which is often applied in the analysis of clinical trial data

  • The influence of sample size n, noise ε and prior modelling is similar for all three indices, but the type I error rate control is better for the full region of practical equivalence (ROPE)

Read more

Summary

Introduction

The replication crisis hit the medical sciences about a decade ago, but today still most of the flaws inherent in null hypothesis significance testing (NHST) have not been solved. The multitude of Bayesian significance and effect measures which contrast the gold standard of significance in medical research, the p-value, causes a lack of agreement on which measure to report. The goal often can be defined to test the efficacy of a new treatment or medication and investigate the size of an effect. Common settings use a treatment and control group, and the goal is to measure differences in a response variable like blood pressure. The gold standard in medical research for deciding if a new treatment or drug was more effective than the control treatment or drug is the p-value. The dominance of pvalues when comparing two groups in medical (and other) research is overwhelming: Nuijten et al [1] showed in a meta-analysis that of 258105 p-values reported in journals between 1985 and 2013, 26% belonged to a t-statistic, see Wetzels et al [2]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call