Abstract
SummaryRejection inference aims to reduce sample bias and to improve model performance in credit scoring. We propose a semisupervised clustering approach as a new rejection inference technique. K-prototype clustering can deal with mixed types of numeric and categorical characteristics, which are common in consumer credit data. We identify homogeneous acceptances and rejections and assign labels to part of the rejections according to the label of acceptances. We test the performance of various rejection inference methods in logit, support vector machine and random-forests models based on data sets of real consumer loans. The predictions of clustering rejection inference show advantages over other traditional rejection inference methods. Inferring the label of the rejection from semisupervised clustering is found to help to mitigate the sample bias problem and to improve the predictive accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of the Royal Statistical Society Series A: Statistics in Society
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.