On the Exact Size of Tests of Treatment Effects in Multi-Arm Clinical Trials

Chris J Lloyd

doi:10.1111/anzs.12089

Abstract

Summary When testing treatment effects in multi-arm clinical trials, the Bonferroni method or the method of Simes 1986) is used to adjust for the multiple comparisons. When control of the family-wise error rate is required, these methods are combined with the close testing principle of Marcus et al. (1976). Under weak assumptions, the resulting p-values all give rise to valid tests provided that the basic test used for each treatment is valid. However, standard tests can be far from valid, especially when the endpoint is binary and when sample sizes are unbalanced, as is common in multi-arm clinical trials. This paper looks at the relationship between size deviations of the component test and size deviations of the multiple comparison test. The conclusion is that multiple comparison tests are as imperfect as the basic tests at nominal size α/m where m is the number of treatments. This, admittedly not unexpected, conclusion implies that these methods should only be used when the component test is very accurate at small nominal sizes. For binary end-points, this suggests use of the parametric bootstrap test. All these conclusions are supported by a detailed numerical study.

Full Text