Decision Tree C4.5 Performance Improvement using Synthetic Minority Oversampling Technique (SMOTE) and K-Nearest Neighbor for Debtor Eligibility Evaluation

Edi Priyanto,Enny Itje Sela,Noourul Islam,Luther Alexander Latumakulita

doi:10.33096/ilkom.v15i2.1676.373-381

Edi Priyanto, Enny Itje Sela + Show 2 more

Open Access

https://doi.org/10.33096/ilkom.v15i2.1676.373-381

Copy DOI

Abstract

Nowadays, information technology especially machine learning has been used to evaluate the feasibility of debtors. One of the challenges in this classification model is the occurrence of imbalanced datasets, especially in the German Credit Dataset. Another challenge is developing an optimal model for evaluating debtor eligibility. Based on these challenges, this study aims to develop an optimal model for evaluating debtor eligibility on the German Credit Dataset, using the decision trees, k-Nearest Neighbor (k-NN) and Synthetic Minority Oversampling Technique (SMOTE). SMOTE and k-NN is used to overcome challenges regarding imbalanced datasets. While the decision tree are applied to produce a debtor classification model. In general, the steps taken are preparing datasets, pre-processing data, dividing datasets, oversampling with SMOTE, and classification models using decision trees, and testing. Model performance evaluation is represented by accuracy values obtained from the confusion matrix and area under curve (AUC) values generated by the Receiver Operating Characteristic (ROC). Based on the tests that have been carried out, the best accuracy value in the test is obtained at 73.00% and the AUC value is 0.708, in parameters k = 3 and Max-Depth = 25. Based on the analysis produced, the proposed model can improve performance compared to if the dataset is not applied SMOTE.

Full Text