Abstract

Under the background of the increasing demand for credit evaluation and risk prediction, the establishment of an effective credit evaluation model for small- and medium-sized enterprises has become a research hotspot. Based on previous studies, this paper proposes a two-layer feature extraction method based on Gradient Boosting Decision Tree (GBDT) and Convolutional Neural Network (CNN). First, based on the original features, GBDT is used to combine and automatically screen them, the missing values in the feature are processed, and the transformed high-dimensional sparse features are obtained. Then, CNN is used to extract features further, and finally, the logistic regression (LR) model is used to predict. In the simulation experiment, this paper takes a dataset of 14,366 small- and medium-sized enterprise credit evaluations as the analysis samples to verify the results. The results show that the GBDT-CNN-LR model has the best performance. The model also shows good generalization ability and stability in the reliability test.

Highlights

  • For the credit financing of small- and medium-sized enterprises, on the one hand, due to their small scale, high operating, and capital flow risks, financing channels and financing limits will be restricted; on the other hand, the high debt repayment risk and fraudulent behavior of smalland medium-sized enterprises will bring a huge risk of capital loss to the banking industry

  • Aiming at the shortcomings of existing research, this paper proposes a hybrid ensemble model using the GBDTCNN method for feature extraction to evaluate corporate credit. e model uses the Gradient Boosting Decision Tree (GBDT)-Convolutional Neural Network (CNN) method to extract the original data features, which can effectively deal with the missing values of the samples while reducing the difficulty of feature engineering, thereby reducing the assumption of the data missing mechanism and the dependence on the data distribution model, which has better robustness to abnormal situations in the original data

  • The traditional GBDT-logistic regression (LR) model is still difficult to achieve the expected high accuracy rate. e accuracy rate of LR is limited by the previous feature engineering. erefore, this paper proposes to use CNN on the basis of the feature vector generated by GBDT. e intention is to find higher-dimensional features as input data to improve the prediction accuracy of LR regression

Read more

Summary

Introduction

For the credit financing of small- and medium-sized enterprises, on the one hand, due to their small scale, high operating, and capital flow risks, financing channels and financing limits will be restricted; on the other hand, the high debt repayment risk and fraudulent behavior of smalland medium-sized enterprises will bring a huge risk of capital loss to the banking industry. Compared with credit evaluation methods based on machine learning algorithms, traditional statistical methods often require more complicated feature engineering in the early stage, which is inefficient, but the accuracy of the model is largely affected by the early feature engineering work. Wang et al [16] selected appropriate indicators and used an improved SVM model for analysis to be able to Wireless Communications and Mobile Computing detect the credit risk of SMEs. Luo et al [17] used a deep learning network and applied a deep belief network with Restricted Boltzmann Machines to credit scoring, which has higher accuracy than that of traditional logistic regression methods. Zhong et al [18] compared the machine learning training effects of BP, ELM, I-ELM, and SVM, and the results showed that the effects of ELM and BP neural networks are better

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.