Abstract
The performance of credit scoring models can be compromised when dealing with imbalanced datasets, where the number of defaulted borrowers is significantly lower than that of non-defaulters. To address this challenge, we propose a gradient boosting decision tree with the generalised extreme value distribution model (GEV-GBDT). Our approach replaces the conventional symmetric logistic sigmoid function with the asymmetric cumulative distribution function of the GEV distribution as the activation function. We derive a novel loss function based on the maximum likelihood estimation of the GEV distribution within the boosting framework. This modification allows the model to focus more on the minority class by emphasising the tail of the response curve, and the shape parameter of the GEV distribution offers flexibility in controlling the model’s emphasis on minority samples. We examine the performance of this approach using four real-life loan datasets. The empirical results show that the GEV-GBDT model achieves superior classification performance compared to other commonly used imbalanced learning methods, including the synthetic minority oversampling technique and the cost-sensitive framework. Furthermore, we conduct performance tests on several datasets with varying imbalance ratios and find that GEV-GBDT performs better on extremely imbalanced datasets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have