Abstract

Many classification tasks suffer from the class imbalance problem that seriously hinders the precision of classifiers. The existing algorithms frequently incorrectly categorize new instances into the majority class. The ensemble learning is an effective method to address the imbalance problem, as is the Splitting Balancing Ensemble (SBE) method that learns the unbalanced dataset by converting it into multiple balanced subsets on which sub-classifiers are built. However, the SBE generates balanced subsets that are too small when learning a highly unbalanced dataset and lead to under-fitting. We propose the Distance-based Balancing Ensemble (DBE) method to deal with this issue and improve the generalization performance of the classification algorithm. The DBE divides highly unbalanced learning set into multiple unbalanced subsets with a much lower imbalance ratio and then applies a modified adaptive semi-unsupervised weighted oversampling method to each subset to obtain balanced subsets for the sub-classifiers. We further propose our Distance-based Combination Rule (DCR) as a more effective method for combining the ensemble results. Tests with 48 public unbalanced datasets from public repositories are performed to demonstrate the effectiveness of the DBE model with the DCR. The results show that the DBE-DCR model outperforms other ensemble models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.