Abstract

One of the problems that often arise in classification analysis is unbalanced data. This problem causes misclassification, so that it affects the sensitivity, especially in the minority class. Unbalanced data can be handled using Synthetic Minority Oversampling Technique (SMOTE). In addition, the ensemble method is used in the classification process because it can improve classification performance. The study evaluates the ensemble method and combines it with the SMOTE to deal with the problem. The data used in this study include balance-scale, nursery, red wine quality, internet firewall, and Air Pollution Index. The study focused on random forest and adaboost in the class of ensemble methods and as a comparison to determine the good performance of this method, the k-Nearest Neighbor (KNN) and decision tree in the class of single classifier. The results are evaluated by comparing the ensemble and single classifier methods based on accuracy, sensitivity, and specificity on the data conditions before and after the SMOTE process. The evaluation of the classification result on the five datasets used shows that the ensemble method tends to provide better performance than decision tree and KNN. Data that has been processed with SMOTE produced a better sensitivity, especially in the minority class.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call