Abstract

Noise and class imbalance are two common data characteristics related to the quality and nature of many real-world data sources, which usually negatively affect the performance of many machine learning classification algorithms. A wide range of studies have investigated the problem of class imbalance and noise in isolation, and very few of them have studied their combined effect. In this paper, we propose a robust bagging-based ensemble method that tries to pay attention to both problems combined. The proposed method is based on the idea of Balanced Bagging (BB) to balance the bootstraps, but with a different sampling process, in which the probability of selecting an instance will be based on its level of hardness, i.e. the probability of an instance being misclassified irrespective of the choice of the classifier. The approach of the proposed method is based on estimating the hardness of each instance in a training dataset, and ensuring that bootstraps are balanced, and at the same time have instances of varying degrees of hardness (Easy, Normal, and Hard). We evaluate the performance of the proposed method on 30 synthetic imbalanced datasets with different levels of noise and imbalance ratios and compare its performance against the BB method. We observe that the proposed method performs significantly better than BB regardless of the noise level or imbalance ratio. Furthermore, we calculate the Equalized Loss of Accuracy (ELA) to assess the robustness of both methods under different levels of noise. The results indicate that the proposed method is more robust (not affected by noise as much) compared to BB. The Wilcoxon signed rank statistical test shows that there is a significant difference in both, performance and robustness, between the proposed method and BB, suggesting that representing varying levels of hardness in bootstraps is a better bootstrapping approach that improves the performance of ensemble methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.