Synthetic Minority Over-sampling Technique (SMOTE) for handling imbalanced data in poverty classification

Firza Refo Adi Pratama,Siskarossa Ika Oktora

doi:10.3233/sji-220080

Abstract

Poverty data in official statistics data is important for development planning. The lower percentage of the poor recorded yearly indicates good development of a country. Moreover, there is always a problem when performing an inferential and classification analysis because of the imbalanced data, thereby leading to biases in the estimation results and prediction errors in the classification. One of the solutions to this problem is using Synthetic Minority Over-sampling Technique (SMOTE). Therefore, this study aims to evaluate the inference and classification quality using the binary logistic regression model without and with SMOTE. The data utilized was the poverty status of households in the rural and urban areas in East Java, Indonesia as contained in the 2019 National Socio-Economic Survey. Furthermore, the variables used are poverty status of the household, the age of the household head (HH), the ratio of household members who are employed, gender of the HH, number of household members, education level of HH, and occupation of the HH. It was concluded that the model with SMOTE approach was better at inference and classifying the results.

Full Text