Abstract

Peer to peer lending is famous for easy and fast loans from complicated traditional lending institutions. Therefore, big data and machine learning are needed for credit risk analysis, especially for potential defaulters. However, data imbalance and high computation have a terrible effect on machine learning prediction performance. This paper proposes a stacking ensemble learning with features selection based on embedded techniques (gradient boosted trees (GBDT), random forest (RF), adaptive boosting (AdaBoost), extra gradient boosting (XGBoost), light gradient boosting machine (LGBM), and decision tree (DT)) to predict the credit risk of individual borrowers on peer to peer (P2P) lending. The stacking ensemble model is created from a stack of meta-learners used in feature selection. The feature selection+ stacking model produces an average of 94.54% accuracy and 69.10 s execution time. RF meta-learner+Stacking ensemble is the best classification model, and the LGBM meta-learner+stacking ensemble is the fastest execution time. Based on experimental results, this paper showed that the credit risk prediction for P2P lending could be improved using the stacking ensemble model in addition to proper feature selection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call