Abstract

Ensemble learning is one of the best solutions for imbalanced classification problems. Diversity is a key factor that affects the performance of ensemble learning. Most existing diversity metrics such as Q-statistics measure diversity based on the outputs of the base classifiers, incurring high complexity on model training due to the need to re-train base classifiers to achieve satisfactory diversity. We propose a new diversity measure, named Instance Euclidean Distance metric (IED), to evaluate diversity directly based on the training data without training base classifiers, which can significantly cut down the time costs of diversity measuring. A new imbalanced ensemble learning algorithm named P-EUSBagging is proposed to reduce training complexity and improve learning performance by combining IED with population-based incremental learning to generate training datasets with the maximal data-level diversity. Experimental results demonstrate that the diversities measured by IED and three classifier-based diversity measures exhibit a mean absolute correlation coefficient of 0.94, and P-EUSBagging significantly reduces training time and improves learning performance on both Geometric Mean (G-Mean) and Area Under the Curve (AUC).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.