Poverty prediction using E-commerce dataset and filter-based feature selection approach

Dedy Rahman Wijaya,Raden Ilham Fadhilah Ibadurrohman,Elis Hernawati,Wawa Wikusna

doi:10.1038/s41598-024-52752-7

Dedy Rahman Wijaya, Raden Ilham Fadhilah Ibadurrohman + Show 2 more

Open Access

https://doi.org/10.1038/s41598-024-52752-7

Copy DOI

Journal: Scientific Reports	Publication Date: Feb 7, 2024
License type: CC BY 4.0

Affiliation: Telkom University

Abstract

Poverty is a problem that occurs in many countries, notably in Indonesia. The common methods used to obtain poverty information are surveys and censuses. However, this process takes a long time and uses a lot of human resources. On the other hand, governments and policymakers need a faster approach to know social-economic conditions for area development plans. Hence, in this paper, we develop e-commerce data and machine learning algorithms as a proxy for poverty levels that can provide faster information than surveys or censuses. The e-commerce dataset is used and this high-dimensional data becomes a challenge. Hence, feature selection algorithms are employed to determine the best features before building a machine learning model. Furthermore, three machine learning algorithms such as support vector regression, linear regression, and k-nearest neighbor are compared to predict the poverty rate. Hence, the contribution of this paper is to propose the combination of statistical-based feature selection and machine learning algorithms to predict the poverty rate based on e-commerce data. According to the experimental results, the combination of f-score feature selection and support vector regression surpasses other methods. It shows that e-commerce data and machine learning algorithms can be potentially used as a proxy for predicting poverty.

Full Text