Combination of Oversampling and Undersampling Techniques on Imbalanced Datasets

Ankita Bansal,Ayush Verma,Sarabjot Singh,Yashonam Jain

doi:10.1007/978-981-19-3679-1_55

Ankita Bansal, Ayush Verma + Show 2 more

https://doi.org/10.1007/978-981-19-3679-1_55

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

AbstractMany practical classification datasets are unbalanced, meaning that one of the classes is in the majority when compared to the others. In various real-world circumstances, class-imbalanced datasets arise, where the number of data samples in a class is not equal to the other class. To develop good classification models based on present level calculations, using these datasets is difficult, particularly for separating minority classes from the majority class. To address the issue of class imbalance, under/oversampling procedures are used to minimize and enhance the quantities of data examined in minority and majority class. This paper explores the utilization of combination of both undersampling and oversampling techniques mainly synthetic minority oversampling technique (SMOTE) and neighborhood cleaning rule (NCL) to balance the datasets. The performance has been evaluated using two machine learning algorithms. The results are then classified using recall measure and geometric mean which showed improved performance of the algorithms.KeywordsClass imbalance problemUndersamplingOversamplingSMOTENCLMinority classMajority classBinary classification

Full Text