Abstract

Abstract Each machine learning model will generate a different importance variable even though the method used is the same. Interpreting the variable significance is confusing. This study proposes combining several variable importance measures using a simulated annealing algorithm with an initial solution of mean and mode. The study uses simulation and empirical data. The simulation data are divided into three scenarios: no correlation, moderate correlation, and high correlation among predictor variables. The empirical data consist of 24 predictor variables. The machine learning models are classification models of random forest, extreme gradient boosting, neural network, and support vector machine. Based on the simulation data study, the combined variable importance will be optimal when predictor variables have low correlation. The simulated annealing algorithms show convergent objective values around the 25th iteration in empirical data. The more predictor variables, the higher the accuracy of this variable importance. Accuracy is optimal when the number of predictors exceeds ten. The five most important variables in explaining family food insecurity are the education of the family head, the floor type of the house, the number of family members who have a savings account, ownership of land, and decent drinking water.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call