Abstract

Most existing studies on credit scoring adapted a concept of classifier ensemble for solving an imbalanced dataset. They apply resampling methods to generate multiple training subsets for constructing multiple base classifiers. However, this approach leads to several problems that degrade the classification performance, such as problems of information loss, model overfitting, and computational cost. Thus, we propose a novel ensemble approach for developing a credit scoring model based on a cost-sensitive neural network, called Cost-sensitive Neural Network Ensemble (CS-NNE). In the proposed approach, multiple class weights are adapted to original training data, enabling the multiple base neural networks to consider imbalanced classes. Following this approach, a high diversity of multiple base classifiers without consequent problems can be achieved. The approach's effectiveness is evaluated on five real-world credit datasets. Among them is a loan-requesting dataset provided by a financial institution in Thailand. The remaining datasets are publicly available and widely used by several existing studies. The experimental results showed that the proposed CS-NNE approach improves the predictive performance over a single neural network based on imbalanced credit datasets, e.g., Thai credit dataset, by achieving 1.36%, 15.67%, and 6.11% Area under the ROC Curve (AUC), Default Detection Rate (DDR), and G-Mean (GM), respectively, and achieving the best Misclassification Cost (MC). The proposed CS-NNE approach can effectively solve a class of imbalance problems and outperform many existing models. The prediction model can well compromise between classes of default (bad credit applicants) and non-default (good credit applicants), whereas existing approaches preferred a class of non-default over default loans (having high specificity and low DDR), resulting in NPL.

Highlights

  • A credit scoring model is a statistical analysis tool that determines the creditworthiness of a loan applicant by estimating the probability of default based on historical data [1]

  • The proposed approach can address the problems in the credit scoring task and improve the performance of credit scoring model

  • Credit scoring model has become a powerful tool for banks and other financial institutions to assess the creditworthiness of applicants

Read more

Summary

INTRODUCTION

A credit scoring model is a statistical analysis tool that determines the creditworthiness of a loan applicant by estimating the probability of default based on historical data [1]. Wei et al [8] combined the outlier removal method and classification algorithm to develop a credit scoring model called backflow learning It was relearned the misclassified data points and combined the prediction of based learners by a two-layer ensemble. By the indirect cost-sensitive methods, in 2018, He et al [5] and Sun et al [7] introduced the idea of generated training subsets using different resampling rates for ensemble classifiers to develop a credit scoring model. Their results were superior to other comparative algorithms. Popular ensemble methods, such as RF [53], XGBoost [54], Bagging [55], and AdaBoost [56], are included

EXPERIMENTS
EXPERIMENTAL RESULTS
RESULTS ON THE THAI CREDIT DATASET
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call