Credit scoring is an essential technique for credit risk management in the financial industry. However, most credit scoring models face the challenge of reject inference, which refers to the lack of post-loan performance data for rejected applicants, leading to sample selection bias and inaccurate credit assessment. Traditional credit scoring methods tackle this issue by assuming that the missing labels for rejected samples are missing at random (MAR) and by measuring sample similarity directly in the original feature space. Nevertheless, these strategies are not suitable for real-world business scenarios. Inspired by metric learning and transductive learning, we propose a novel credit scoring model called transductive semi-supervised metric network (TSSMN), which formalizes reject inference as a semi-supervised binary classification problem with the prior assumption of missing not at random (MNAR). TSSMN consists of two interconnected modules: the embedding metric network (EMN) that maps samples from the original feature space to the metric space for similarity measurement, and the transductive propagation network (TPN) that performs label propagation based on sample similarity. We evaluate TSSMN on a real-world credit dataset and compare it with traditional credit scoring methods. The results indicate that TSSMN can overcome sample selection bias and more accurately classify credit applicants. Therefore, TSSMN has the potential to enhance credit risk assessment in real-world business scenarios.
Read full abstract