Abstract
With the inundation of large data sets requiring analysis and empirical model building, outliers have become commonplace. Fortunately, several standard statistical software packages have allowed practitioners to use robust regression estimators to easily fit data sets that are contaminated with outliers. However, little guidance is available for selecting the best subset of the predictor variables when using these robust estimators. We initially consider cross-validation and bootstrap resampling methods that have performed well for least-squares variable selection. It turns out that these variable selection methods cannot be directly applied to contaminated data sets using a robust estimation scheme. The prediction errors, inflated by the outliers, are not reliable measures of how well the robust model fits the data. As a result, new resampling variable selection methods are proposed by introducing alternative estimates of prediction error in the contaminated model. We demonstrate that, although robust estimation and resampling variable selection are computationally complex procedures, we can combine both techniques for superior results using modest computational resources. Monte Carlo simulation is used to evaluate the proposed variable selection procedures against alternatives through a designed experiment approach. The experiment factors include percentage of outliers, outlier geometry, bootstrap sample size, number of bootstrap samples, and cross-validation assessment size. The results are summarized and recommendations for use are provided.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.