Abstract

Conventional corporate credit evaluation models are primarily based solely on financial variables in conjunction with supervised learning methods. However, the acquisition of the labeled sample information required by supervised learning methods is generally a costly and lengthy process, and is therefore difficult to obtain in practice, while the introduction of non-financial variables can be expected to provide greater diagnostic scope. The present study addresses these issues by proposing a semi-supervised generalized additive logistic regression model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information that includes both financial and non-financial variables. The model not only can accommodate linear non-separable problems, but can also be trained using both labeled and unlabeled samples at the same time, while simultaneously realizing parameter estimation and variable selection. We also develop more precise definitions of corporate credit anomalies to increase the accuracy of corporate default risk assessments. The model is trained and tested using a dataset composed of actual financial and non-financial corporate data freely available on the Internet, and is demonstrated to provide better variable selection and credit anomaly prediction with better accuracy and robustness than other state-of-the-art models. The results reveal key financial variables correlated with corporate credit anomaly detection, and also verify that the non-financial variables significantly improve the corporate credit anomaly prediction accuracy of the model.

Highlights

  • Credit is a generalized metric that appraises the likelihood that a debtor can repay the principal and interest on a loan as scheduled without default

  • We propose a semi-supervised generalized additive logistic regression (SSGALR) model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information

  • The corporate credit anomaly prediction performance of the proposed SSGALR algorithm was verified by comparisons with conventional logistic regression algorithms, including the supervised semi-parametric logistic regression (SSPLR) and supervised logistic regression (SLR) algorithms, in addition to extreme gradient boosting (XGBoost), which is a high-performance ensemble learning algorithm commonly employed in regression and classification applications

Read more

Summary

INTRODUCTION

Credit is a generalized metric that appraises the likelihood that a debtor can repay the principal and interest on a loan as scheduled without default. Conventional corporate credit evaluation models are primarily based on financial variables in conjunction with supervised learning methods. While the limited availability of labeled sample information can be addressed by the use of a semi-supervised learning method, the introduction. The study addresses the above-discussed deficiencies in currently available tools applied by market participants for evaluating corporate risk of default in real time. We propose a semi-supervised generalized additive logistic regression (SSGALR) model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information. The SSGALR model is able to make full use of unlabeled non-financial data samples to improve its learning performance by simultaneously conducting variable selection when processing high-dimensional data.

CREDIT EVALUATION MODELS
CREDIT EVALUATION VARIABLES
DEFINITION OF CORPORATE CREDIT ANOMALIES
MODEL SETTINGS
OPTIMIZATION ALGORITHM
EMPIRICAL SAMPLE SELECTION AND DATA
VARIABLES FOR CORPORATE CREDIT ANOMALY DETECTION
MODEL PREDICTION PERFORMANCE
Algorithm
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call