Abstract

Stability selection (multisplit) approach is a variable selection procedure which relies on multisplit data to overcome the shortcomings that may occur to single-split data. Unfortunately, this procedure yields very poor results in the presence of outliers and other contamination in the original data. The problem becomes more complicated when the regression residuals are serially correlated. This paper presents a new robust stability selection procedure to remedy the combined problem of autocorrelation and outliers. We demonstrate the good performance of our proposed robust selection method using real air quality data and simulation study.

Highlights

  • The approach of splitting data into two parts is not new in the statistical inference and data analysis

  • The main aim of this study was to develop a reliable alternative approach that is capable of selecting the correct variables in the final model for data having the combined problem of outliers and autocorrelated errors

  • We have considered the well known all-subsets-Akaike Information Criterion (AIC) and all-subsets-Bayesian Information Criterion (BIC), multisplitAIC and multisplit-BIC variables selection methods in this regard

Read more

Summary

Introduction

The approach of splitting data into two parts is not new in the statistical inference and data analysis. The existing classical linear regression stability selection procedure is affected by outliers, resulting in unreliable variables that are selected to the final model This problem can be rectified by incorporating robust estimator in the selection procedure. One often used the Cochrane-Orcutt or Prais-Winsten methods (Greene [12], Gujarati and Porter [11]) to rectify autocorrelation problem These procedures are based on the OLS estimates, which are not robust and are affected by outliers. The concentrated (clean) dataset is identified and all possible subsets procedures, namely, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) methods, were applied to the concentrated dataset in the last steps of the RFCH method This approach is called concentrating allsubset selection and can be considered as a trade-off between the quality of data and the interpretability of a model

The Consistency of Robust Stability Selection
Robust Stability All-Subset Selection Method
Simulation Study
Air Quality Data
Findings
Conclusions and Recommendation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call