Handling Imbalanced Data Sets Using SMOTE and ADASYN to Improve Classification Performance of Ecoli Data Sets

Mahendra Dwifebri,Anthony Mas Halim,Fhira Nhita

doi:10.47065/bits.v5i1.3647

Mahendra Dwifebri, Anthony Mas Halim + Show 1 more

Open Access

https://doi.org/10.47065/bits.v5i1.3647

Copy DOI

Abstract

In this digital era, machine learning is a technology that is in demand by organizations and individuals. In the age of data and digital information, the ability to process data efficiently is needed. As the amount of data grows, there are various problems in machine learning. One of them is that with the increasing amount of data, class imbalance is also often found. Class imbalance is a condition where a class dominates another class, in one example case is when the positive value class has less number than the negative class. The class that is less in number is categorized as the minority class, while the class that dominates the dataset is called the majority class. Class imbalance can affect classification performance in a bad way, so handling imbalanced classes is needed to improve classification results. Classification of imbalanced data using Random Forest has satisfactory results, as well as by implementing SMOTE and ADASYN as sampling methods because they are highly popular and easy to implement. The best model produced in this study is the model that applies SMOTE oversampling on a dataset with 10% IR with a balanced accuracy of 98.75%, and the best result when applying ADASYN oversampling is on a dataset with 13% IR and a balanced accuracy of 99.03%.

Full Text