Abstract

Classification is the process of building a model that can distinguish between different classes of data. The model aims to predict the class of testing data based on patterns or relationships learned from training data. One of the data processing algorithms used to build classification models is Categorical Boosting (CatBoost). However, in general, the resulting models are difficult to interpret. To facilitate the interpretation of complex classification models, methods such as SHAP (SHapley Additive exPlanations) are needed. SHAP is a method to explain individual predictions. SHAP is based on the game theoretically optimal shapley values. In this study, an analysis of important SHAP variables was conducted on the CatBoost classification model to identify variables characterizing occurrences of food insecurity in households. The data used in this study was obtained from the Survei Sosial Ekonomi Nasional (Susenas) in March 2021 in Aceh Province, sourced from the Badan Pusat Statistik (BPS). There are 13,126 observations in the research data. The results from four evaluated classification models on the testing data showed that the best model had accuracy, sensitivity, specificity, and AUC values of 0.703, 0.349, 0.798, and 0.637, respectively. Furthermore, the results of the analysis of important SHAP variables showed that the variables number of household members who smoke ( ), education of the household head ( ), wall types ( ), drinking water source ( ), and decent sanitation ( ) significantly contributed to the occurrences of food insecurity in households in Aceh Province in the year 2021.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.