A monte carlo comparison of five procedures for identifying outliers in linear regression

Farid Kianifard,William H Swallow

doi:10.1080/03610929008830300

Abstract

Five procedures for detecting outliers in linear regression are compared: sequential testing of the maximum internally studentized residual or maximum externally studentized (cross-validatory) residual, Marasinghe's multistage procedure, and two procedures based on recursive residuals, calculated on adaptively-ordered observations. All of these procedures initially test a no-outliers hypothesis, and they have an underlying unity in their general approach to the outlier identification problem. Which procedure is most effective depends on the number and placement of outliers in the data. The multistage procedure is very effective in some cases, but requires prespecifying a value k, the maximum number of outliers one can then detect; the procedure can suffer severely if the chosen value for k is either larger or smaller than the number of outliers actually in the data.

Full Text