Abstract

Peer-to-peer (P2P) Lending is a type of financial innovation that offers loans without intermediaries to individuals and companies. In the P2P lending system, there is a risk of default on the loan which causes the company to lose. Many studies have to reduce the risk of default by developing a classification model of prediction of default that focuses on increasing accuracy. However, the big problem with prediction is data imbalance and low performance classification algorithms. The purpose of this study is to improve the accuracy of default risk prediction by balancing the data and combining the stacking model ensemble with the meta-learner. The proposed new model consists of 3 optimization parts, the first is Synthetic Minority Oversampling Technique (SMOTE), the second is the selection of features and the third is stacking ensemble learning. The SMOTE method is used to balance the data, the feature selection LightGBM and stacking ensemble learning (LGBFS-StackingXGBoost) to optimize machine learning accuracy. A new model of stacking ensemble learning by combining three base-learner algorithms namely KNN, SVM and Random Forest into the XGBoost meta-learner algorithm. The model was tested using two datasets, namely the online P2P lending dataset and the lending club loan data analysis dataset. The evaluation results show that LGBFS-StackingXGBoost is the best model for both datasets. In the online P2P lending dataset, it received an accuracy of 99,982% and in the lending club loan data analysis dataset, it received an accuracy of 91,434%. This study shows that the accuracy of the prediction model can be improved using the LGBFS-StackingXGBoost method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call