Abstract

This chapter reviews diagnostic and robust procedures for detecting outliers and other interesting observations in linear regression. First, we present statistics for detecting single outliers and influential observations and show their limitations for multiple outliers in high-leverage situations. Second, we discuss diagnostic procedures designed to avoid masking by finding first a clean subset for estimating the parameters and then increasing its size by incorporating, one by one, new homogeneous observations until a heterogeneous observation is found. We also discuss procedures based on sensitive observations for detecting high-leverage outliers in large data sets using the eigenvectors of a sensitivity matrix. We briefly review robust estimation methods and its relationship with diagnostic procedures. Next, we consider large high-dimensional data sets where the application of iterative procedures can be slow and show that the joint use of simple univariate statistics, as predictive residuals, Cook’s distances, and Peña’s sensitivity statistic, can be a useful diagnostic tool. We also comment on other recent procedures based on regularization and sparse estimation and conclude with a brief analysis of the relationship of outlier detection and cluster analysis. A real data and a simulated example are presented to illustrate the procedures presented in the chapter.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.