Class imbalance of a data set is a crucial problem in machine learning where one class significantly outnumbers others. In such a data set, classification is a troublesome task for the standard classification algorithms, leading to bias towards the majority class. Different methods have been developed so far, such as oversampling, undersampling, and cost-sensitive learning, to deal with class imbalance circumstances. Among these techniques, oversampling technique does not suffer from the information loss and critical cost selection challenges. However, appropriate synthetic sample generation can be challenging and vulnerable to privacy leakage. This research proposed an oversampling technique, called CARBO, using threshold-based geometric rotation and majority class influenced clustering. Unlike the existing resampling approaches to class imbalance problem, we contribute to consider the data privacy and optimal sample generation together for effective oversampling. The performance of CARBO is evaluated using 44 benchmark imbalanced data set. The empirical analysis elucidates that CARBO can make boosting-based C4.5 ensemble classifiers perform higher for 73% of the data set than six state-of-the-art approaches. In addition, the theoretical compatibility analysis of CARBO demonstrates its robustness.
Read full abstract