Abstract

Variable selection has become an indispensable part of statistical analysis for high-dimensional datasets. However, classical variable selection algorithms, such as regularization methods, are computationally high demanding when sample size and dimension of dataset are both large. Lin, Foster and Ungar (Journal of the American Statistical Association 106 (2011) 232–247) proposed a variable selection algorithm called VIF regression for massive datasets which is more computationally efficient and able to control the marginal false discovery rate. Building on the idea of VIF regression, we propose a new variable selection algorithm, Double-Gates Streamwise regression (DGS), which quickly tests whether predictors significantly reduce the prediction error in one-pass search. DGS regression has two main appealing features. First, DGS regression is computationally efficient and low demanding in the usage of memory. Second, DGS regression can control the false discovery rate, and hence improve the predictive and explanatory abilities. Its advantages relative to VIF regression and some other popular variable selection algorithms are demonstrated in extensive numerical simulated experiments and a real dataset analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.