Abstract
Credit risk scoring predictions represent an effective guide for lenders to discriminate between potential good (who will repay the loan) and bad (who will default) borrowers in the online social lending market. A common characteristic of such a market is a lower percentage of defaulted borrowers than non-defaulted borrowers; thus, the sample is class imbalanced. Class imbalance may affect the accuracy of default predictions, as classifiers tend to be biased towards the majority class (good borrowers). We analyse the default prediction performance when combining class rebalancing methods with different regression and machine learning techniques. We also propose to combine multiple probability predictions to improve the predictive performance. The analysis is based on a book of loans (with a three-year term) funded in the 2010–2015 period though the online platform of Lending Club. The results show that some measures of predictive accuracy tend to improve when the scoring models are trained using a rebalanced, rather than an imbalanced sample, except when the extreme gradient boosting approach is applied. Finally, we find that combining multiple probability predictions via regularised logistic regression may help to improve the predictive accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.