Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data

Bahareh Amirshahi,Salim Lahmiri

doi:10.1111/exsy.13599

Abstract

AbstractThis study explores the performance of gradient boosting methods in bankruptcy prediction for a highly imbalanced dataset. We developed different heterogenous ensemble models based on three popular gradient boosting methods—XGBoost, LightGBM, and CatBoost. Our ensemble models were optimized using the cross‐validation method and the results of the hold‐out test sets showed that the optimized ensemble models not only outperform their base learners, but also improve the state‐of‐the‐art benchmark results on the same dataset. Interestingly, we observed that the data oversampling technique that is commonly used to address the class imbalance issue had an adverse impact on our ensemble models' performance. This indicates that our models are robust to the imbalanced dataset problem that typically degrades the classification performance of machine learning models.

Full Text