Abstract

In the past few years, Peer-to-Peer lending (P2P lending) has grown rapidly in the world. The main idea of P2P lending is disintermediation and removing the intermediaries like banks. For a small business and some individuals without enough credit or credit history, P2P lending is a good way to apply for a loan. However, the fundamental problem of P2P lending is information asymmetry in this model, which may not correctly estimate the default risk of lending. Lenders only determine whether or not to fund the loan by the information provided by borrowers, causing P2P lending data to be imbalanced datasets which contain unequal fully paid and default loans. Imbalanced datasets are quite common in the real worlds, such as credit card fraud in transactions, bad products in the plant and so on. Unfortunately, the imbalanced data are unfriendly to the normal machine learning schemes. In our scenario, models without any adaptive methods would focus on learning the normal repayment. However, the characteristic of the minority class is critical in the loaning business. In this study, we utilize not only several machine learning schemes for predicting the default risk of P2P lending but also re-sampling and cost-sensitive mechanisms to process imbalanced datasets. Furthermore, we use the datasets from Lending Club to validate our proposed scheme. The experiment results show that our proposed scheme can effectively raise the prediction accuracy for default risk.

Highlights

  • Peer-to-Peer lending (P2P lending) has been developed in 2005, this application has grown rapidly in the world recently

  • Information asymmetry becomes a fundamental problem of P2P lending because lenders only determine the loan based on information that is provided by borrowers

  • The third, different classes have the same error cost. They introduced the sampling strategies and cost sensitive learning to address the issue of expectation imbalanced datasets and used the other performance metrics that were more suitable for imbalanced datasets, such as confusion matrix, precision, F1-score and so on

Read more

Summary

Introduction

Peer-to-Peer lending (P2P lending) has been developed in 2005, this application has grown rapidly in the world recently. INDEX TERMS Peer-to-Peer lending, imbalanced datasets, re-sampling, machine learning. Peer-to-Peer lending dataset is imbalanced because fully paid and default loans are not equal. This study uses under-sampling and cost-sensitive learning for dealing with the imbalanced dataset.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call