Credit Default Risk Prediction of Lenders with Resampling Methods

Tong Chen

doi:10.1109/mlbdbi54094.2021.00032

Abstract

Peer-to-peer (P2P) lending is a platform to connect lenders and borrowers directly without too many complicated procedures. Prediction of default risk predicts if the lender can repay the loan on time to better decide for investors of the P2P lending platform. A good prediction protects the investors from decreasing the risk of them. However, the imbalanced class problem of predicting default risk negatively affects the prediction accuracy, which leads to our investors making wrong decisions lead to suffering great losses. This paper uses XGBoost with three resampling methods to predict the default risk without the class imbalanced problem. The three resampling methods include SMOTE, NearMiss, and manual 1:1 random selection. First, we preprocess the data to improve the quality of data and better analysis. Then, we do feature engineering to create valuable features based on the preprocessed data. After that, we overcome the imbalanced class problem of default risk prediction with SMOTE, NearMiss, and manual 1:1 random selection, respectively. Finally, we use the processed data to train our model. To verify the effectiveness of the proposed method, we compare our method with Logistic Regression, Random Forest, and LightGBM. The results show that XGBoost with three resampling methods to predict the default risk without the class imbalanced problem.

Full Text