Abstract
Class imbalance of a data set is a crucial problem in machine learning where one class significantly outnumbers others. In such a data set, classification is a troublesome task for the standard classification algorithms, leading to bias towards the majority class. Different methods have been developed so far, such as oversampling, undersampling, and cost-sensitive learning, to deal with class imbalance circumstances. Among these techniques, oversampling technique does not suffer from the information loss and critical cost selection challenges. However, appropriate synthetic sample generation can be challenging and vulnerable to privacy leakage. This research proposed an oversampling technique, called CARBO, using threshold-based geometric rotation and majority class influenced clustering. Unlike the existing resampling approaches to class imbalance problem, we contribute to consider the data privacy and optimal sample generation together for effective oversampling. The performance of CARBO is evaluated using 44 benchmark imbalanced data set. The empirical analysis elucidates that CARBO can make boosting-based C4.5 ensemble classifiers perform higher for 73% of the data set than six state-of-the-art approaches. In addition, the theoretical compatibility analysis of CARBO demonstrates its robustness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.