Abstract

One common challenge in classification modeling is the existence of imbalanced classes within the data. If the analysis continues with imbalanced classes, it is probable that the result will demonstrate inadequate performance when forecasting new data. Various approaches exist to rectify this class imbalance issue, such as random oversampling, random undersampling, and the Synthetic Minority Over-sampling Technique for Nominal and Continuous (SMOTE-NC). Each of these methods encompasses distinct techniques aimed at achieving balanced class distribution within the dataset. Comparison of classification performance on imbalanced classes handled by these three methods has never been carried out in previous research. Therefore, this study undertakes an evaluation of classification models (specifically Gradient Boosting, Random Forest, and Extremely Randomized Trees) in the context of imbalanced class data. The results of this research show that the random undersampling method used to balance the class distribution has the best performance on two classification models (Random Forest and Gradient Boosted Tree).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call