Previous studies have examined the behavior of outlier detection rules for symmetric distributions that label as “outside” any observations that fall outside the interval [FL – k(Fu – FL), Fu + k(Fu – FL)], where FL and FU are functions of the order statistics estimating the 0.25 and 0.75 quantiles of the distribution underlying the i.i.d. sample. A measure of the performance of this type of rule is the “some-outside rate” per sample computed with respect to a given (usually Gaussian) null distribution. The “some-outside rate” (SOR) per sample is the probability that the sample will contain one or more observations labeled as “outside,” given that the null distribution is the true distribution. In this paper, asymptotic expansions of k = kn as a function of n that guarantee an asymptotically constant, prespecified SOR are given for a variety of symmetric null distributions including the Gaussian, double exponential, logistic, and Cauchy distributions. The main theorem also applies to the case of a nonsymmetric null distribution by slightly modifying the labeling rule.
Read full abstract