What to make of equivalence testing with a post-specified margin?

Harlan Campbell,Paul Gustafson

doi:10.15626/mp.2020.2506

Abstract

In order to determine whether or not an effect is absent based on a statistical test, the recommended frequentist tool is the equivalence test. Typically, it is expected that an appropriate equivalence margin has been specified before any data are observed. Unfortunately, this can be a difficult task. If the margin is too small, then the test's power will be substantially reduced. If the margin is too large, any claims of equivalence will be meaningless. Moreover, it remains unclear how defining the margin afterwards will bias one's results. In this short article, we consider a series of hypothetical scenarios in which the margin is defined post-hoc or is otherwise considered controversial. We also review a number of relevant, potentially problematic actual studies from the clinical trials research, with the aim of motivating a critical discussion as to what is acceptable and desirable in the reporting and interpretation of equivalence tests.

Highlights

Despite the fact that the researchers fail to pre-specify a specific margin prior to observing the data, the regulatory agency will still accept a claim of equivalence/non-inferiority on the basis that, given some non-controversial post-hoc margin, there is sufficient evidence
Researchers advocate that equivalence testing has great potential to “facilitate theory falsification” (Quintana, 2018)
Expectations that a margin be pre-specified have been well established for quite some time (Piaggio et al, 2006)

Summary

Introduction

Should the equivalence margin not be specified a priori, and be defined based on the observed data, we have the following admittedly improper hypothesis test: H0 : θ ≤ −∆(X), or θ ≥ ∆(X) vs H1 : −∆(X) < θ < ∆(X) In this case, we may not necessarily have that Pr(reject H0|H0 is true) ≤ α. To better understand Ng, 2003’s concern, consider a similar setup where, for a standard null hypothesis significance test, a large, possibly infinite number of prespecified α-levels (allowable type I error rates) are defined. Despite the fact that the researchers fail to pre-specify a specific margin prior to observing the data, the regulatory agency will still accept a claim of equivalence/non-inferiority on the basis that, given some non-controversial post-hoc margin, there is sufficient evidence. Readers are left to judge for themselves

Conclusion

Findings

Conflict of Interest and Funding