Bivariate boxplots, multiple outliers, multivariate transformations and discriminant analysis: The 1997 Hunter Lecture

Anthony C Atkinson,Marco Riani

doi:10.1002/(sici)1099-095x(199711/12)8:6<583::aid-env277>3.0.co;2-l

Anthony C Atkinson, Marco Riani

https://doi.org/10.1002/(sici)1099-095x(199711/12)8:6<583::aid-env277>3.0.co;2-l

Copy DOI

Abstract

Outliers can have a large influence on the model fitted to data. The models we consider are the transformation of data to approximate normality and also discriminant analysis, perhaps on transformed observations. If there are only one or a few outliers, they may often be detected by the deletion methods associated with regression diagnostics. These can be thought of as 'backwards' methods, as they start from a model fitted to all the data. However such methods become cumbersome, and may fail, in the presence of multiple outliers. We instead consider a 'forward' procedure in which very robust methods, such as least median of squares, are used to select a small, outlier free, subset of the data. This subset is increased in size using a search which avoids the inclusion of outliers. During the forward search we monitor quantities of interest, such as score statistics for transformation or, in discriminant analysis, misclassification probabilities. Examples demonstrate how the method very clearly reveals structure in the data and finds influential observations, which appear towards the end of the search. In our examples these influential observations can readily be related to patterns in the original data, perhaps after transformation.

Full Text