Randomizing outputs to increase variable selection accuracy

Chun-Xia Zhang,Nan-Nan Ji,Guan-Wei Wang

doi:10.1016/j.neucom.2016.08.067

Abstract

Variable selection plays a key role in explanatory modeling and its aim is to identify the variables that are truly important to the outcome. Recently, ensemble learning techniques have manifested great potential in improving the performance of some traditional methods such as lasso, genetic algorithm, stepwise search. Following the main principle to build a variable selection ensemble, we propose in this paper a novel approach by randomizing outputs (i.e., adding some random noise to the response) to maximize variable selection accuracy. In order to generate multiple but slightly different importance measures for each variable, some Gaussian noise is artificially added to the response. The new training set (i.e, the original design matrix together with the new response vector) is then fed into genetic algorithm to perform variable selection. By repeating this process a number of trials and fusing the results by simple averaging, a more reliable importance measure is obtained for each candidate variable. The variables are then ranked and further determined to be important or not by a thresholding rule. The performance of the proposed method is studied with some simulated and real-world data in the framework of linear and logistic regression models. The results demonstrate that it compares favorably with several other existing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Randomizing outputs to increase variable selection accuracy

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Sep 2, 2016
Citations: 7

Similar Papers

Building variable selection ensembles for linear regression models by adding noise
Guan-Wei Wang ... Chun-Xia Zhang
-
Guan-Wei Wang, et. al.Guan-Wei Wang ... Chun-Xia Zhang
01 Jul 2015
01 Jul 2015

PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection
Chun-Xia Zhang ... Sang-Woon Kim
Computational Statistics | VOL. 31
Chun-Xia Zhang, et. al.Chun-Xia Zhang ... Sang-Woon Kim
16 Mar 2016
Computational Statistics | VOL. 31

Variable selection in Logistic regression model with genetic algorithm.
Zhongheng Zhang ... Songshi Dai
Annals of Translational Medicine | VOL. 6
Zhongheng Zhang, et. al.Zhongheng Zhang ... Songshi Dai
01 Feb 2018
Annals of Translational Medicine | VOL. 6

Variable Selection in Macroeconomic Forecasting with Many Predictors
Zhenzhong Wang ... Cindy Yu
Econometrics and Statistics | VOL. -
Zhenzhong Wang, et. al.Zhenzhong Wang ... Cindy Yu
01 Jan 2023
Econometrics and Statistics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Randomizing outputs to increase variable selection accuracy

Abstract

Talk to us

Similar Papers

More From: Neurocomputing