Class imbalance is commonly observed in real-world data, and it is problematic in that it degrades classification performance due to biased supervision. Undersampling is an effective resampling approach to the class imbalance. The conventional undersampling-based approaches involve a single fixed sampling ratio. However, different sampling ratios have different preferences toward classes. In this paper, an undersampling-based ensemble framework, MUEnsemble, is proposed. This framework involves weak classifiers of different sampling ratios, and it allows for a flexible design for weighting weak classifiers in different sampling ratios. To demonstrate the principle of the design, in this paper, a uniform weighting function and a Gaussian weighting function are presented. An extensive experimental evaluation shows that MUEnsemble outperforms undersampling-based and oversampling-based state-of-the-art methods in terms of recall, gmean, F-measure, and ROC-AUC metrics. Also, the evaluation showcases that the Gaussian weighting function is superior to the uniform weighting function. This indicates that the Gaussian weighting function can capture the different preferences of sampling ratios toward classes. An investigation into the effects of the parameters of the Gaussian weighting function shows that the parameters of this function can be chosen in terms of recall, which is preferred in many real-world applications.
Read full abstract