Semi-Supervised Learning Classification Based on Generalized Additive Logistic Regression for Corporate Credit Anomaly Detection

Song Han

doi:10.1109/access.2020.3035128

Abstract

Conventional corporate credit evaluation models are primarily based solely on financial variables in conjunction with supervised learning methods. However, the acquisition of the labeled sample information required by supervised learning methods is generally a costly and lengthy process, and is therefore difficult to obtain in practice, while the introduction of non-financial variables can be expected to provide greater diagnostic scope. The present study addresses these issues by proposing a semi-supervised generalized additive logistic regression model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information that includes both financial and non-financial variables. The model not only can accommodate linear non-separable problems, but can also be trained using both labeled and unlabeled samples at the same time, while simultaneously realizing parameter estimation and variable selection. We also develop more precise definitions of corporate credit anomalies to increase the accuracy of corporate default risk assessments. The model is trained and tested using a dataset composed of actual financial and non-financial corporate data freely available on the Internet, and is demonstrated to provide better variable selection and credit anomaly prediction with better accuracy and robustness than other state-of-the-art models. The results reveal key financial variables correlated with corporate credit anomaly detection, and also verify that the non-financial variables significantly improve the corporate credit anomaly prediction accuracy of the model.

Highlights

Credit is a generalized metric that appraises the likelihood that a debtor can repay the principal and interest on a loan as scheduled without default
We propose a semi-supervised generalized additive logistic regression (SSGALR) model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information
The corporate credit anomaly prediction performance of the proposed SSGALR algorithm was verified by comparisons with conventional logistic regression algorithms, including the supervised semi-parametric logistic regression (SSPLR) and supervised logistic regression (SLR) algorithms, in addition to extreme gradient boosting (XGBoost), which is a high-performance ensemble learning algorithm commonly employed in regression and classification applications

Summary

INTRODUCTION

Credit is a generalized metric that appraises the likelihood that a debtor can repay the principal and interest on a loan as scheduled without default. Conventional corporate credit evaluation models are primarily based on financial variables in conjunction with supervised learning methods. While the limited availability of labeled sample information can be addressed by the use of a semi-supervised learning method, the introduction. The study addresses the above-discussed deficiencies in currently available tools applied by market participants for evaluating corporate risk of default in real time. We propose a semi-supervised generalized additive logistic regression (SSGALR) model for detecting corporate credit anomalies based on a high proportion of unlabeled sample information. The SSGALR model is able to make full use of unlabeled non-financial data samples to improve its learning performance by simultaneously conducting variable selection when processing high-dimensional data.

CREDIT EVALUATION MODELS

CREDIT EVALUATION VARIABLES

DEFINITION OF CORPORATE CREDIT ANOMALIES

MODEL SETTINGS

OPTIMIZATION ALGORITHM

EMPIRICAL SAMPLE SELECTION AND DATA

VARIABLES FOR CORPORATE CREDIT ANOMALY DETECTION

MODEL PREDICTION PERFORMANCE

Algorithm

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Semi-Supervised Learning Classification Based on Generalized Additive Logistic Regression for Corporate Credit Anomaly Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Modeling corporate financial distress using financial and non-financial variables
Senthil Arasu Balasubramanian ... Radhakrishna G.S
International Journal of Law and Management | VOL. 61
Senthil Arasu Balasubramanian, et. al.Senthil Arasu Balasubramanian ... Radhakrishna G.S
23 Oct 2019
International Journal of Law and Management | VOL. 61

Predicting credit risk on the basis of financial and non-financial variables and data mining
Younes Boujelbene ... Sihem Khemakhem
Review of Accounting and Finance | VOL. 17
Younes Boujelbene, et. al.Younes Boujelbene ... Sihem Khemakhem
13 Aug 2018
Review of Accounting and Finance | VOL. 17

Predicting financial distress using financial and non-financial variables
Francois Van Der Colff ... Frans Vermaak
Journal of Economic and Financial Sciences | VOL. 8
Francois Van Der Colff, et. al.Francois Van Der Colff ... Frans Vermaak
30 Apr 2015
Journal of Economic and Financial Sciences | VOL. 8

English
Hasni Yusrianti ... Tien Norma Habsari
Jurnal Akuntansi dan Investasi | VOL. 17
Hasni Yusrianti, et. al.Hasni Yusrianti ... Tien Norma Habsari
01 Jan 2015
Jurnal Akuntansi dan Investasi | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-Supervised Learning Classification Based on Generalized Additive Logistic Regression for Corporate Credit Anomaly Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access