Abstract

Credit scoring has been attracting increasing attention in the Chinese consumer financial industry. Traditional approaches are easily influenced by sample selection bias because they use accepted applicant samples only, while the applicant population also includes rejected applicants. Reject inference is a technique to infer good/bad labels for rejected applicants, which can overcome biases in credit scoring. However, previously proposed reject inference methods usually ignore the imbalanced distribution in accepted data, which means that good applicants are much more than bad ones in most practical consumer loan applications. Both the neglect of rejected data and the imbalanced distribution in accepted data weaken the performance of current credit scoring models. In this paper, we propose a novel reject inference framework that takes into account the imbalanced data distribution for consumer credit scoring. First, we use an advanced graph-based semi-supervised learning algorithm to solve the reject inference problem, which is called label spreading. Second, faced with an imbalanced distribution of good and bad samples in accepted applicants, we conduct imbalanced learning using a modified Synthetic Minority Over-sampling Technique before reject inference. Then, six binary classifiers are studied in our proposed framework for credit scoring modeling. Finally, we present the results of four exact experiments as well as online A/B tests for performance evaluation using data provided by a leading Chinese fintech company. Empirical results indicate that the proposed framework performs better than traditional scoring models across different evaluation metrics, representing a progressive method that promotes credit scoring research as well as improving fintech practices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call