Abstract

Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

Highlights

  • For lending institutions, credit scoring systems aim to provide probability of default (PD) for their clients and to satisfy a minimum-loss principle for their sustainability

  • In the first step of two-stage filter feature selection (TSFFS) algorithm, we considered statistically significant variables based on the t-Itnestht eanfidrscthsit-espquofarTeStFeFsSt.aRlgeograirthdmin,gwtheecot-ntesisdt,earetdwsot-astaismtipcalellyt-steigsnt iwfiacasnatsvseasrsiaebdlefosrbcaosendtinounothues vt-aterisatbalensd. cFhoi-rsqexuaamreptlees, tt.hReetgoatradl ivnaglutheeoft-taegsgt,reagatwteol-osaamn pblaelatn-tceestfowr ahsoamsseeismsepdrofvoremcoennttinisuonuost vrealraitaebdletso

  • The main conclusions from the comparison are that machine-learning models showed better performance compared to FICO credit scoring in 2001s

Read more

Summary

Introduction

Credit scoring systems aim to provide probability of default (PD) for their clients and to satisfy a minimum-loss principle for their sustainability. Credit officers or expert-based credit scoring model determine whether borrowers can fulfill their requirements, it has changed over time with technological advances. This change needs the establishment of an automated credit decision-making system that can avoid loss of opportunity or credit losses to reduce potential loss for each lending institution. In recent years, automated credit scoring has become very crucial because of the growing number of financial services without human involvement. The use of technology and automation to reduce the operating costs for modern lending institutions requires the development of an accurate credit scoring model. Numerous authors have proposed different feature-selection methods for credit scoring such as wrapper-feature-selection algorithms [13], Wald statistic using chi-square test [14], evolutionary feature selection with correlation [15], hybrid feature-selection methods [16] and multi-stage feature selection based on genetic algorithm [17]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.