Abstract

AbstractMany practical classification datasets are unbalanced, meaning that one of the classes is in the majority when compared to the others. In various real-world circumstances, class-imbalanced datasets arise, where the number of data samples in a class is not equal to the other class. To develop good classification models based on present level calculations, using these datasets is difficult, particularly for separating minority classes from the majority class. To address the issue of class imbalance, under/oversampling procedures are used to minimize and enhance the quantities of data examined in minority and majority class. This paper explores the utilization of combination of both undersampling and oversampling techniques mainly synthetic minority oversampling technique (SMOTE) and neighborhood cleaning rule (NCL) to balance the datasets. The performance has been evaluated using two machine learning algorithms. The results are then classified using recall measure and geometric mean which showed improved performance of the algorithms.KeywordsClass imbalance problemUndersamplingOversamplingSMOTENCLMinority classMajority classBinary classification

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call