Abstract
Conventional corporate credit evaluation models are primarily based solely on financial variables in conjunction with supervised learning methods. However, the acquisition of the labeled sample information required by supervised learning methods is generally a costly and lengthy process, and is therefore difficult to obtain in practice, while the introduction of non-financial variables can be expected to provide greater diagnostic scope. The present study addresses these issues by proposing a semi-supervised generalized additive logistic regression model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information that includes both financial and non-financial variables. The model not only can accommodate linear non-separable problems, but can also be trained using both labeled and unlabeled samples at the same time, while simultaneously realizing parameter estimation and variable selection. We also develop more precise definitions of corporate credit anomalies to increase the accuracy of corporate default risk assessments. The model is trained and tested using a dataset composed of actual financial and non-financial corporate data freely available on the Internet, and is demonstrated to provide better variable selection and credit anomaly prediction with better accuracy and robustness than other state-of-the-art models. The results reveal key financial variables correlated with corporate credit anomaly detection, and also verify that the non-financial variables significantly improve the corporate credit anomaly prediction accuracy of the model.
Highlights
Credit is a generalized metric that appraises the likelihood that a debtor can repay the principal and interest on a loan as scheduled without default
We propose a semi-supervised generalized additive logistic regression (SSGALR) model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information
The corporate credit anomaly prediction performance of the proposed SSGALR algorithm was verified by comparisons with conventional logistic regression algorithms, including the supervised semi-parametric logistic regression (SSPLR) and supervised logistic regression (SLR) algorithms, in addition to extreme gradient boosting (XGBoost), which is a high-performance ensemble learning algorithm commonly employed in regression and classification applications
Summary
Credit is a generalized metric that appraises the likelihood that a debtor can repay the principal and interest on a loan as scheduled without default. Conventional corporate credit evaluation models are primarily based on financial variables in conjunction with supervised learning methods. While the limited availability of labeled sample information can be addressed by the use of a semi-supervised learning method, the introduction. The study addresses the above-discussed deficiencies in currently available tools applied by market participants for evaluating corporate risk of default in real time. We propose a semi-supervised generalized additive logistic regression (SSGALR) model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information. The SSGALR model is able to make full use of unlabeled non-financial data samples to improve its learning performance by simultaneously conducting variable selection when processing high-dimensional data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have