Abstract

The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.