Abstract

Critical to any regression analysis is the identification of observations that exert a strong influence on the fitted regression model. Traditional regression influence statistics such as Cook's distance and DFFITS, each based on deleting single observations, can fail in the presence of multiple influential observations if these influential observations “mask” one another, or if other effects such as “swamping” occur. Masking refers to the situation where an observation reveals itself as influential only after one or more other observations are deleted. Swamping occurs when points that are not actually outliers/influential are declared to be so because of the effects on the model of other unusual observations. One computationally expensive solution to these problems is the use of influence statistics that delete multiple rather than single observations. In this article, we build on previous work to produce a computationally feasible algorithm for detecting an unknown number of influential observations in the presence of masking. An important difference between our proposed algorithm and existing methods is that we focus on the data that remain after observations are deleted, rather than on the deleted observations themselves. Further, our approach uses a novel confirmatory step designed to provide a secondary assessment of identified observations. Supplementary materials for this article are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.