Imbalanced Data Oversampling Technique Based on Convex Combination Method

Mohammed Elnahas,Mahmoud Hussein,Arabi Keshk

doi:10.21608/ijci.2021.72508.1047

Mohammed Elnahas, Mahmoud Hussein + Show 1 more

Open Access

PDF Available

https://doi.org/10.21608/ijci.2021.72508.1047

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Classification process is the predicting a label for a specific set of inputs. In such process, it is difficult to classify given inputs when a dataset is imbalanced. Most of existing machine learning classifiers suffer from dealing with the imbalanced data, because it makes the classifiers highly biased towards the majority class. This bias may lead to less accuracy in minority class prediction. Data oversampling is one of the most important solutions used to balance the data particularly when dataset is small and/or imbalanced dataset. Synthetic Minority Over-sampling Technique (SMOTE), Borderline-SMOTE, Adaptive Synthetic (ADASYN) and Weighted SMOTE(W-SMOTE) are the most popular techniques used for data oversampling. However, the main drawback of SMOTE and ADASYN techniques is they increase the overlapping between classes and then the produced samples are not representative of the original data distribution. The Borderline-SMOTE may neglect some important samples to produce new samples. To overcome, the problems in the existing over-sampling techniques, in this paper, we propose a new data over-sampling method that depends on the convex combination method to generate new samples of the minority class. The convex combination allows us to produce new samples that have the same original data distribution. We evaluated our approach over four standard imbalanced datasets (Yeast, Glass Identification, Paw, and Wisconsin Prognosis Breast Cancer (WPBC)). The experimental results show that our proposed method gives better performance in terms of accuracy, precision, recall. F1-measure and Area under the curve (AUC).

Full Text