Abstract

Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth. This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method. The data used are imbalanced data where one of the categories is small enough so that the resample of both sampling method is used. In this study, several machine learning methods were applied, including the Decision Tree, Naïve Bayes, K-Nearest Neighbor (KNN), and Rotation Forest. The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods. If viewed from the indicators of accuracy, specificity, sensitivity, AUC, and the highest Kappa coefficient produced, the best method is the KNN method. The KNN model has an accuracy value of 0.73 percent, sensitivity of 0.68 percent, specificity of 78 percent, and AUC of 0.73.

Highlights

  • Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth

  • This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method

  • The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods

Read more

Summary

PENDAHULUAN kemiskinan tahun 2019 berdasarkan data Susenas

Kemiskinan masih menjadi salah satu masalah pokok dalam pembangunan ekonomi selain ketimpangan, pengangguran dan pertumbuhan ekonomi. Metode Desicion Tree mampu mengintegrasikan model yang mudah ke dalam sistem basis data serta memiliki akurasi yang baik serta dapat menemukan kombinasi data yang tidak terduga. Kurnia [8] mengklasifikasikan kemiskinan menggunakan metode KNN dengan tingkat akurasi hingga 90 persen. King dan Zeng [11] menyatakan bahwa ketika metode klasifikasi digunakan pada kasus imbalanced data, maka pengklasifikasian cenderung menihilkan peluang dari kelas minoritas karena nilai prediksi akan cenderung pada kelas mayoritas, sehingga tingkat ketepatan klasifikasi yang dihasilkan menjadi kurang baik. Pada penelitian ini mengkaji dan menerapkan beberapa metode machine learning seperti DT, NB, KNN dan RF dengan memperhatikan imbalanced data dan set data besar. Skema yang digunakan adalah menggunakan pembagaian data dengan metode deterministik (holdout) dengan melakukan resample kombinasi undersampling dan oversampling sekaligus (both/ combine sampling) dalam pemodelan klasifikasi status miskin rumah tangga di Indonesia

Modelling
Decision Tree
Naïve Bayes
K-Nearest Neighbor
Rotation Forest
Evaluasi
Tahapan dalam Analisis Data
HASIL DAN PEMBAHASAN
Pemilihan Model
Findings
KESIMPULAN

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.