Bayesian Hodges-Lehmann tests for statistical equivalence in the two-sample setting: Power analysis, type I error rates and equivalence boundary selection in biomedical research

Riko Kelter

doi:10.1186/s12874-021-01341-7

Abstract

BackgroundNull hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences. However, the problems of NHST and p-values have been discussed widely and various Bayesian alternatives have been proposed. Some proposals focus on equivalence testing, which aims at testing an interval hypothesis instead of a precise hypothesis. An interval hypothesis includes a small range of parameter values instead of a single null value and the idea goes back to Hodges and Lehmann. As researchers can always expect to observe some (although often negligibly small) effect size, interval hypotheses are more realistic for biomedical research. However, the selection of an equivalence region (the interval boundaries) often seems arbitrary and several Bayesian approaches to equivalence testing coexist.MethodsA new proposal is made how to determine the equivalence region for Bayesian equivalence tests based on objective criteria like type I error rate and power. Existing approaches to Bayesian equivalence testing in the two-sample setting are discussed with a focus on the Bayes factor and the region of practical equivalence (ROPE). A simulation study derives the necessary results to make use of the new method in the two-sample setting, which is among the most frequently carried out procedures in biomedical research.ResultsBayesian Hodges-Lehmann tests for statistical equivalence differ in their sensitivity to the prior modeling, power, and the associated type I error rates. The relationship between type I error rates, power and sample sizes for existing Bayesian equivalence tests is identified in the two-sample setting. Results allow to determine the equivalence region based on the new method by incorporating such objective criteria. Importantly, results show that not only can prior selection influence the type I error rate and power, but the relationship is even reverse for the Bayes factor and ROPE based equivalence tests.ConclusionBased on the results, researchers can select between the existing Bayesian Hodges-Lehmann tests for statistical equivalence and determine the equivalence region based on objective criteria, thus improving the reproducibility of biomedical research.

Highlights

Null hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences
Kelter BMC Medical Research Methodology (2021) 21:171 (Continued from previous page) statistical equivalence and determine the equivalence region based on objective criteria, improving the reproducibility of biomedical research
Type I error rates and influence of sample size This section analyses the first part of the first research question: Which type I error rates are attained by the various available Bayesian approaches to equivalence testing and how do the obtained type I error rates depend on sample size? Figure 1 shows the resulting type I error rates for Bayesian equivalence testing approaches which are based on the Bayes factor and the region of practical equivalence (ROPE)

Summary

Introduction

Null hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences. Among the problems of NHST are inflated type I error rates [5, 6], the inability to make use of optional stopping [7,8,9] and problems with the interpretation of censored data [7, 8] which are frequently observed in the biomedical sciences, for example in clinical trials. Those problems are caused mostly by the fact that frequentist NHST and p-values violate the likelihood principle [10], which is of paramount importance in statistical science. In Bayesian inference, probabilistic statements about parameters can be made instead of relying only on likelihood-based reasoning [13, 16]

Methods

Results

Discussion

Conclusion