Abstract

Credit risk scoring predictions represent an effective guide for lenders to discriminate between potential good (who will repay the loan) and bad (who will default) borrowers in the online social lending market. A common characteristic of such a market is a lower percentage of defaulted borrowers than non-defaulted borrowers; thus, the sample is class imbalanced. Class imbalance may affect the accuracy of default predictions, as classifiers tend to be biased towards the majority class (good borrowers). We analyse the default prediction performance when combining class rebalancing methods with different regression and machine learning techniques. We also propose to combine multiple probability predictions to improve the predictive performance. The analysis is based on a book of loans (with a three-year term) funded in the 2010–2015 period though the online platform of Lending Club. The results show that some measures of predictive accuracy tend to improve when the scoring models are trained using a rebalanced, rather than an imbalanced sample, except when the extreme gradient boosting approach is applied. Finally, we find that combining multiple probability predictions via regularised logistic regression may help to improve the predictive accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call