Abstract

Only a few customers can be labeled in realistic credit-scoring problems, while many other customers cannot. Further, satisfactory performance is difficult, as traditional supervised learning methods can only use labeled samples to build credit-scoring models. Semi-supervised learning (SSL) can use both labeled and unlabeled samples to solve this problem, but existing credit-scoring research has primarily constructed single semi-supervised models. This study introduces SSL, cost-sensitive learning, a group method of data handling (GMDH), and an ensemble learning technique to propose a GMDH-based cost-sensitive semi-supervised selective ensemble (GCSSE) model. This involves two stages: (1)First, train an ensemble model composed of N base classifiers on the initial training set L with class labels, use it to selectively label the samples from the dataset U without class labels, add them with their predicted labels to the training set, and update the N base classifiers on the new training set; (2)Second, classify L and the test set using the respective trained base classifiers, and construct a cost-sensitive GMDH neural network to obtain the selective ensemble classification results for the test set. Experimental comparisons of five public customer credit score datasets and an empirical analysis of a real customer credit score dataset suggest that this model exhibits the best overall credit-scoring performance compared with one supervised ensemble model and three semi-supervised ensemble models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call