Abstract
Commercial banks are required to explain the credit evaluation results to their customers. Therefore, banks attempt to improve the performance of their credit scoring models while ensuring the interpretability of the results. However, there is a tradeoff between the logistic regression model and machine learning-based techniques regarding interpretability and model performance because machine learning-based models are a black box. To deal with the tradeoff, in this study, we present a two-stage logistic regression method based on the Bayesian approach. In the first stage, we generate the derivative variables by linearly combining the original features with their explanatory powers based on the Bayesian inference. The second stage involves developing a credit scoring model through logistic regression using these derivative variables. Through this process, the explanatory power of a large number of original features can be utilized for default prediction, and the use of logistic regression maintains the model's interpretability. In the empirical analysis, the independent sample t-test reveals that our proposed approach significantly improves the model’s performance compared to that based on the conventional single-stage approach, i.e., the baseline model. The Kolmogorov–Smirnov statistics show a 3.42 percentage points (%p) increase, and the area under the receiver operating characteristic shows a 2.61%p increase. Given that our two-stage modeling approach has the advantages of interpretability and enhanced performance of the credit scoring model, our proposed method is essential for those in charge of banking who must explain credit evaluation results and find ways to improve the performance of credit scoring models.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have