A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest.

Gang Li,Meng-Di Shen,Hong-Dong Ma,Ke-Xin Zhang,Rong-Yue Liu

doi:10.3390/e23050582

Gang Li, Meng-Di Shen + Show 3 more

Open Access

https://doi.org/10.3390/e23050582

Copy DOI

Abstract

Background: the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, but the model itself also has many shortcomings such as many hyperparameters and large dependence on big data. There is still a lot of room to improve its interpretability and robustness. Methods: the deep forest or multi-Grained Cascade Forest (gcForest) is a decision tree depth model based on the random forest algorithm. Using multidimensional scanning and cascading processing, gcForest can effectively identify and process high-dimensional feature information. At the same time, gcForest has fewer hyperparameters and has strong robustness. So, this paper constructs a two-stage hybrid default discrimination model based on multiple feature selection methods and gcForest algorithm, and at the same time, it optimizes the parameters for the lowest type II error as the first principle, and the highest AUC and accuracy as the second and third principles. GcForest can not only reflect the advantages of traditional statistical models in terms of interpretability and robustness but also take into account the advantages of deep learning models in terms of accuracy. Results: the validity of the hybrid default discrimination model is verified by three real open credit data sets of Australian, Japanese, and German in the UCI database. Conclusions: the performance of the gcForest is better than the current popular single classifiers such as ANN, and the common ensemble classifiers such as LightGBM, and CNNs in type II error, AUC, and accuracy. Besides, in comparison with other similar research results, the robustness and effectiveness of this model are further verified.

Highlights

In recent years, research on the default discriminant model has received extensive attention from researchers and financial institutions
The feature selection methods used at this stage are as follows: (1) Full-variable Logistic regression; (2) Stepwise regression based on AIC criterion; (3) Stepwise regression based on BIC criterion; (4) Lasso-logistic regression; (5) Elastic Net Logistic regression
After data preprocessing in the first stage (Section 4.2), five feature selection algorithms are applied, and the results of feature selection are evaluated according to type II error, AUC, and accuracy of Logistic regression

Summary

Introduction

Research on the default discriminant model has received extensive attention from researchers and financial institutions. To make up for the shortcomings of the above research and improve the interpretability, classification performance, and robustness of the credit scoring model, this paper establishes a new two-stage hybrid model combining multiple feature selection methods and gcForest. This model considers the differences and complementarities between traditional statistical models and artificial intelligence models and combines the two to complement each other. Zhou et al (2017) proposed a new tree-based ensemble method, gcForest, and proved that it has highly competitive performance with deep neural networks (DNNs) in a wide range of tasks.

Feature Selection

Application of Deep Learning Model in Credit Scoring

Construction of Hybrid Default Discriminant Model Based on GcForest

Experimental Data Set

Data Preprocessing

Evaluation Indicators

Analysis of Feature Selection Results

Analysis on the Results of Ddefault Discrimination

Evaluation Inditcator

Comparison with Other Studies

Conclusions

Methods

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: May 8, 2021
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization
Jia Wu ... Si-Hao Deng
Journal of Electronic Science and Technology | VOL. 17
Jia Wu, et. al.Jia Wu ... Si-Hao Deng
11 Dec 2019
Journal of Electronic Science and Technology | VOL. 17

A comparative study of using Random Forests (RF), Extreme Learning Machine (ELM) and Deep Learning (DL) algorithms in modelling Roadside Particulate Matter (PM10 & PM2.5)
A Suleiman ... M R Tight
IOP Conference Series: Earth and Environmental Science | VOL. 476
A Suleiman, et. al.A Suleiman ... M R Tight
01 Apr 2020
IOP Conference Series: Earth and Environmental Science | VOL. 476

Applications of Machine Learning Methods in Health Outcomes Research: Heart Failure in Women

-

10 Dec 2020
10 Dec 2020

Deep Multigrained Cascade Forest for Hyperspectral Image Classification
Xiaobo Liu ... Zhihua Cai
IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society | VOL. 57
Xiaobo Liu, et. al.Xiaobo Liu ... Zhihua Cai
01 Oct 2019
IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy