Abstract

AbstractThe inconsistency between the distribution of training data and the data that need to be predicted is very common in credit scoring scenarios, which is called dataset shift. The macroeconomic environment and risk control strategies are likely to evolve over time, and the behavior patterns of borrowers may also change. The model trained with past data may not be applicable to the recent stage. Although dataset shift can cause poor model performance, the vast majority of studies do not take this into account. In this study, we propose a method based on adversarial validation, in which partial training set samples with the closest distribution to the predicted data are selected for cross-validation to ensure generalization performance. In addition, the remaining training samples with inconsistent distribution are also involved in the training process, but not in the validation, which makes full use of all the data and further improves the model performance. To verify the effectiveness of the proposed method, comparative experiments with several other data split methods are conducted with the Lending Club dataset. The experimental results demonstrate the importance of dataset shift problem in the field of credit scoring and the superiority of the proposed method.KeywordsDataset shiftData distributionCredit scoringAdversarial validationCross-validation

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call