Abstract

In class imbalanced data set, one class contains more instances than the other class and it is a critical problem in data mining. Many approaches such as oversampling, undersampling, and cost sensitive methods are developed to mitigate the effects of class imbalance but these methods suffer from various shortcomings. In the existing methods, the researchers have hardly used normalization on the imbalanced data set to mitigate the effects. In this work, we implemented two state-of-the-art data balancing methods, Random Undersampling (RUS) and Random Oversampling (ROS), ensembled by AdaBoost algorithm. Then we investigated and compared the two methods with a recently developed approach called Random Splitting data balancing (SplitBal) method with and without applying normalization on the imbalanced data set. For normalization, three well known normalization techniques are used called min-max, z-score and robust-scaling normalization. Our concerned approach, SplitBal is an ensemble method which firstly converts the imbalanced data set into several balanced data set. From the balanced data set, multiple classification models are built and ensembled by max ensemble rule. The empirical analysis using fifteen imbalanced data set elucidates that SplitBal with min-max normalization is dominant over the concerned data balancing methods in this work for Random Forest classifier.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.