Abstract
The class-imbalance problem has been widely distributed in various research fields. The larger the data scale and the higher the data imbalance, the more difficult the proper classification. For large-scale highly imbalanced data sets, the ensemble method based on under-sampling is one of the most competitive techniques among the existing techniques. However, it is susceptible to improperly sampling strategies, easy to lose the useful information of the majority class, and not easy to generalize the learning model. To overcome these limitations, we propose an equalization ensemble method (EASE) with two new schemes. First, we propose an equalization under-sampling scheme to generate a balanced data set for each base classifier, which can reduce the impact of class imbalance on the base classifiers; Second, we design a weighted integration scheme, where the G-mean scores obtained by base classifiers on the original imbalanced data set are used as the weights. These weights can not only make the better-performed base-classifiers dominate the final classification decision, but also adapt to a variety of imbalanced data sets with different scales while avoiding the occurrence of some extremely bad situations. Experimental results on three metrics show that EASE increases the diversity of base classifiers and outperforms twelve state-of-the-art methods on the imbalanced data sets with different scales.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.