Abstract

Ensemble techniques are a method of generalizing multiple individual models to one optimal model, showing high predictive performance. It has the advantage of being available in both regression and classification problems, but it has the disadvantage that it is difficult to interpret the model due to its black-box characteristics. These shortcomings are supplemented by presenting a measure of variable importance, but this is only a relative value between explanatory variables, and there is a limit that the significance of each variable cannot be confirmed. On the other hand, in the case of the regression model, it is possible to test the significance of the variable through p-value and penalty. This study proposed a nonparametric variable selection method that tests the significance of explanatory variables based on the variable importance given in the ensemble technique. The simulation was conducted with eight models, and the results of sensitivity and specificity were compared by methodology. As a result of the simulation, the ensemble nonparametric variable selection showed better classification performance than the variable selection in the existing regression model in eight models. In addition, through case analysis, it was confirmed that the random forest nonparametric variable selection showed excellent performance even when there was a strong correlation between variables.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.