Abstract

Credit scoring is an important tool to guard against commercial risks for banks and lending companies and provides good conditions for the construction of individual personal credit. Ensemble algorithms have shown appealing progress for the improvement of credit scoring. In this study, to meet the challenge of large-scale credit scoring, we propose a heterogeneous deep forest model (Heter-DF), which is established based on considerations ranging from base learner selection, encouragement of the diversity of base learners, and ensemble strategies, for credit scoring. Heter-DF is designed as a scalable cascading framework that can increase its complexity with the scale of the credit dataset. Moreover, each level of Heter-DF is built by multiple heterogeneous tree-based ensembled base learners, avoiding the homogeneous prediction of the ensemble framework. In addition, a weighted voting mechanism is introduced to highlight important information and suppress irrelevant features, making Heter-DF a robust model for credit scoring. Experimental results on four credit scoring datasets and six evaluation metrics show that the cascading framework a good choice for the ensemble of tree-based base learners. A comparison among homogeneous ensembles and heterogeneous ensembles further demonstrates the effectiveness of Heter-DF. Experiments on different training sets indicate that Heter-DF is a scalable framework which not only deals with large-scale credit scoring but also satisfies the condition where small-scale credit scoring is desirable. Finally, based on the good interpretability of a tree-based structure, the global interpretation of Heter-DF is preliminarily explored.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call