Abstract

Credit scoring is an efficient tool for financial institutions to implement credit risk management. In recent years, many novel machine learning models have been developed for credit scoring. Among the existing machine learning models, the heterogeneous ensemble model receives much attention because of its superior performance. This paper presents a new heterogeneous ensemble model based on the generalized Shapley value and the Choquet integral. To do this, the model first uses the fuzzy measure to express the interactive characteristics between any two coalitions of base learners. Based on the accuracy and diversity objective function, a linear programming model for determining the fuzzy measure is built. To retain the original information as much as possible in the training stage, the normal fuzzy number is employed to express the base learner predicted values. Then, the generalized Shapley Choquet integral (GSCI) aggregation operator is defined to calculate the comprehensive predicted value of the ensemble model. Based on the defined aggregation operator and linear programming model, a GSCI approach is proposed for ensemble credit scoring. To illustrate the efficiency and feasibility of the GSCI approach, an experiment of thirteen machine learning models over four public credit scoring datasets and three real-world P2P leading datasets with large volumes of samples is made. Furthermore, robust tests and comparatives analysis are made to demonstrate the adaptability and performance of the GSCI-based ensemble model.

Highlights

  • Credit risk is the main risk for financial institutions, and the effectiveness of credit risk management is the critical issue for the survival and development of financial institutions

  • Various machine learning techniques have been developed recently and have gained much attention, and they can be further split into single models, such as Neural Networks (NN) [21], Support Vector Machines (SVM) [22][24], Decision Trees (DT) [25], and Naive Bayes (NB) [26], and ensemble models, such as AdaBoost [27] and Random Forests (RF) [28], [29]

  • To cope with the problems mentioned above, Ala’raj & Abbod [4], [36] stated that the base learners in the ensemble model interacting in a cooperative manner can improve the predictive accuracy compared with those using traditional aggregation operators and presented a new combination approach based on a consensus system [50] to solve the conflicts among base learners

Read more

Summary

INTRODUCTION

Credit risk is the main risk for financial institutions, and the effectiveness of credit risk management is the critical issue for the survival and development of financial institutions.

LITERATURE REVIEW
ENSEMBLE MODELS
Related work
BASIC CONCEPTS
THE GSCI-BASED ENSEMBLE MODEL
A MODEL FOR THE OPTIMAL FUZZY MEASURE ON
ALGORITHM OF THE GSCI-BASED ENSEMBLE MODEL
CREDIT DATASET AND DATA PREPROCESSING
BASE LEARNERS
EVALUATION METRICS AND BENCHMARKS
STATISTICAL TESTS OF SIGNIFICANCE
CLASSIFICATION RESULTS
SIGNIFICANCE TESTS
ROBUSTNESS TESTS
Comparative analysis
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call