Abstract

Real-world datasets are often imbalanced, posing frequent challenges to canonical machine learning algorithms that assume a balanced class distribution. Moreover, the imbalance problem becomes more complicated when the dataset is multiclass. Although many approaches have been presented for imbalanced learning (IL), research on the multiclass imbalanced problem is relatively limited and deficient. To alleviate these issues, we propose a forest of evolutionary hierarchical classifiers (FEHC) method for multiclass IL (MCIL). FEHC can be seen as a classifier fusion framework with a forest structure, and it aggregates several evolutionary hierarchical multiclassifiers (EHMCs) to reduce generalization error. Specifically, a multichromosome genetic algorithm (MCGA) is designed to simultaneously select (sub)optimal features, classifiers, and hierarchical structures when generating these EHMCs. The MCGA adopts a dynamic weighting module to learn difficult classes and promote the diversity of FEHC. We also present the "stratified underbagging" (SUB) strategy to address class imbalance and the "soft tree traversal" (STT) strategy to make FEHC converge faster and better. We thoroughly evaluate the proposed algorithm using 14 multiclass imbalanced datasets with various properties. Compared with popular and state-of-the-art approaches, FEHC obtains better performance under different evaluation metrics. Codes have been made publicly available on GitHub.https://github.com/CUHKSZ-NING/FEHCClassifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call