Abstract
One of the problems that often arise in classification analysis is unbalanced data. This problem causes misclassification, so that it affects the sensitivity, especially in the minority class. Unbalanced data can be handled using Synthetic Minority Oversampling Technique (SMOTE). In addition, the ensemble method is used in the classification process because it can improve classification performance. The study evaluates the ensemble method and combines it with the SMOTE to deal with the problem. The data used in this study include balance-scale, nursery, red wine quality, internet firewall, and Air Pollution Index. The study focused on random forest and adaboost in the class of ensemble methods and as a comparison to determine the good performance of this method, the k-Nearest Neighbor (KNN) and decision tree in the class of single classifier. The results are evaluated by comparing the ensemble and single classifier methods based on accuracy, sensitivity, and specificity on the data conditions before and after the SMOTE process. The evaluation of the classification result on the five datasets used shows that the ensemble method tends to provide better performance than decision tree and KNN. Data that has been processed with SMOTE produced a better sensitivity, especially in the minority class.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.