Abstract

Class imbalance problem (CIP) exists when the class distribution is not uniform. Many real-world scenarios face CIP which attracted the researcher’s attention to this problem. Training machine learning (ML) models with class imbalanced datasets is a challenging problem. Ensemble methods in ML involve training multiple classifiers, combining or averaging their predictions to come to a final prediction. Specifically designed ensemble-based methods can overcome the difficulty faced by traditional classifiers and can handle the CIP. The performance of 19 ensemble methods for 44 unbalanced datasets is assessed in this paper in order to observe the effects of the class imbalance ratio (CIR). For performance evaluation, we divide these datasets into three categories, i.e., Slightly Imbalance (SI), Moderately Imbalance (MI) and Highly Imbalance (HI) based on CIR. With the proposed perspective, we observe that different ensemble methods perform well in different categories suggesting that the percentage of minority or majority class could be a criterion for the selection of ensemble methods for class imbalance datasets. Moreover, visual representations and different non-parametric statistical tests are also used to have more reliable results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.