Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models

Lindani Dube,Tanja Verster

doi:10.3934/dsfe.2023021

Lindani Dube, Tanja Verster

Open Access

https://doi.org/10.3934/dsfe.2023021

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

<abstract><p>In the realm of machine learning, where data-driven insights guide decision-making, addressing the challenges posed by class imbalance in datasets has emerged as a crucial concern. The effectiveness of classification algorithms hinges not only on their intrinsic capabilities but also on their adaptability to uneven class distributions, a common issue encountered across diverse domains. This study delves into the intricate interplay between varying class imbalance levels and the performance of ten distinct classification models, unravelling the critical impact of this imbalance on the landscape of predictive analytics. Results showed that random forest (RF) and decision tree (DT) models outperformed others, exhibiting robustness to class imbalance. Logistic regression (LR), stochastic gradient descent classifier (SGDC) and naïve Bayes (NB) models struggled with imbalanced datasets. Adaptive boosting (ADA), gradient boosting (GB), extreme gradient boosting (XGB), light gradient boosting machine (LGBM), and k-nearest neighbour (kNN) models improved with balanced data. Adaptive synthetic sampling (ADASYN) yielded more reliable predictions than the under-sampling (UNDER) technique. This study provides insights for practitioners and researchers dealing with imbalanced datasets, guiding model selection and data balancing techniques. RF and DT models demonstrate superior performance, while LR, SGDC and NB models have limitations. By leveraging the strengths of RF and DT models and addressing class imbalance, classification performance in imbalanced datasets can be enhanced. This study enriches credit risk modelling literature by revealing how class imbalance impacts default probability estimation. The research deepens our understanding of class imbalance's critical role in predictive analytics. Serving as a roadmap for practitioners and researchers dealing with imbalanced data, the findings guide model selection and data balancing strategies, enhancing classification performance despite class imbalance.</p></abstract>

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Science in Finance and Economics	Publication Date: Jan 1, 2023
Citations: 3	License type: cc-by

R Discovery Prime

Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models

Abstract

Published Version

Talk to us

Similar Papers

More From: Data Science in Finance and Economics

Lead the way for us

Similar Papers

ECSYAPPS – A framework for analyzing the effectiveness of classification techniques for early prediction of students academic performance in education sector
Sunita M Dol ... Pradip M Jawandhiya
Engineering Applications of Artificial Intelligence | VOL. 134
Sunita M Dol, et. al.Sunita M Dol ... Pradip M Jawandhiya
04 Jun 2024
Engineering Applications of Artificial Intelligence | VOL. 134

Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost
Tahira Fulazzaky ... Agus Mohamad Soleh
Scientific Journal of Informatics | VOL. 11
Tahira Fulazzaky, et. al.Tahira Fulazzaky ... Agus Mohamad Soleh
30 Dec 2025
Scientific Journal of Informatics | VOL. 11

Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models
Afzal Ahmed Soomro ... Abdul Sattar Palli
Results in Engineering | VOL. 24
Afzal Ahmed Soomro, et. al.Afzal Ahmed Soomro ... Abdul Sattar Palli
24 Oct 2024
Results in Engineering | VOL. 24

Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study
Siding Chen ... Yongjun Wang
BMC Medical Research Methodology | VOL. 24
Siding Chen, et. al.Siding Chen ... Yongjun Wang
10 Sep 2024
BMC Medical Research Methodology | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models

Abstract

Published Version

Talk to us

Similar Papers

More From: Data Science in Finance and Economics