Abstract
Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth. This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method. The data used are imbalanced data where one of the categories is small enough so that the resample of both sampling method is used. In this study, several machine learning methods were applied, including the Decision Tree, Naïve Bayes, K-Nearest Neighbor (KNN), and Rotation Forest. The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods. If viewed from the indicators of accuracy, specificity, sensitivity, AUC, and the highest Kappa coefficient produced, the best method is the KNN method. The KNN model has an accuracy value of 0.73 percent, sensitivity of 0.68 percent, specificity of 78 percent, and AUC of 0.73.
Highlights
Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth
This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method
The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods
Summary
Kemiskinan masih menjadi salah satu masalah pokok dalam pembangunan ekonomi selain ketimpangan, pengangguran dan pertumbuhan ekonomi. Metode Desicion Tree mampu mengintegrasikan model yang mudah ke dalam sistem basis data serta memiliki akurasi yang baik serta dapat menemukan kombinasi data yang tidak terduga. Kurnia [8] mengklasifikasikan kemiskinan menggunakan metode KNN dengan tingkat akurasi hingga 90 persen. King dan Zeng [11] menyatakan bahwa ketika metode klasifikasi digunakan pada kasus imbalanced data, maka pengklasifikasian cenderung menihilkan peluang dari kelas minoritas karena nilai prediksi akan cenderung pada kelas mayoritas, sehingga tingkat ketepatan klasifikasi yang dihasilkan menjadi kurang baik. Pada penelitian ini mengkaji dan menerapkan beberapa metode machine learning seperti DT, NB, KNN dan RF dengan memperhatikan imbalanced data dan set data besar. Skema yang digunakan adalah menggunakan pembagaian data dengan metode deterministik (holdout) dengan melakukan resample kombinasi undersampling dan oversampling sekaligus (both/ combine sampling) dalam pemodelan klasifikasi status miskin rumah tangga di Indonesia
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.