Tests for the equality of variances are of interest in many areas such as quality control, agricultural production systems, experimental education, pharmacology, biology, as well as a preliminary to the analysis of variance, dose–response modelling or discriminant analysis. The literature is vast. Traditional non-parametric tests are due to Mood, Miller and Ansari–Bradley. A test which usually stands out in terms of power and robustness against non-normality is the W50 Brown and Forsythe [Robust tests for the equality of variances, J. Am. Stat. Assoc. 69 (1974), pp. 364–367] modification of the Levene test [Robust tests for equality of variances, in Contributions to Probability and Statistics, I. Olkin, ed., Stanford University Press, Stanford, 1960, pp. 278–292]. This paper deals with the two-sample scale problem and in particular with Levene type tests. We consider 10 Levene type tests: the W50, the M50 and L50 tests [G. Pan, On a Levene type test for equality of two variances, J. Stat. Comput. Simul. 63 (1999), pp. 59–71], the R-test [R.G. O'Brien, A general ANOVA method for robust tests of additive models for variances, J. Am. Stat. Assoc. 74 (1979), pp. 877–880], as well as the bootstrap and permutation versions of the W50, L50 and R tests. We consider also the F-test, the modified Fligner and Killeen [Distribution-free two-sample tests for scale, J. Am. Stat. Assoc. 71 (1976), pp. 210–213] test, an adaptive test due to Hall and Padmanabhan [Adaptive inference for the two-sample scale problem, Technometrics 23 (1997), pp. 351–361] and the two tests due to Shoemaker [Tests for differences in dispersion based on quantiles, Am. Stat. 49(2) (1995), pp. 179–182; Interquantile tests for dispersion in skewed distributions, Commun. Stat. Simul. Comput. 28 (1999), pp. 189–205]. The aim is to identify the effective methods for detecting scale differences. Our study is different with respect to the other ones since it is focused on resampling versions of the Levene type tests, and many tests considered here have not ever been proposed and/or compared. The computationally simplest test found robust is W50. Higher power, while preserving robustness, is achieved by considering the resampling version of Levene type tests like the permutation R-test (recommended for normal- and light-tailed distributions) and the bootstrap L50 test (recommended for heavy-tailed and skewed distributions). Among non-Levene type tests, the best one is the adaptive test due to Hall and Padmanabhan.
Read full abstract