Abstract

Social lending is made between peers, and with the risk that the investor can take direct damages from the borrower’s failure to repay, accurate default prediction for borrowers is important. The repayment result can be known after the end of the repayment period, and such data is limited. However, social loans are matched online in real time and large amounts of unlabeled data are being generated. In this paper, we propose a method to combine label propagation and transductive support vector machine (TSVM) with Dempster–Shafer theory for accurate default prediction of social lending using unlabeled data. In order to train a lot of data effectively, we ensemble semi-supervised learning methods with different characteristics. Label propagation is performed so that data having similar features are assigned to the same class and TSVM makes moving away data having different features. Dempster–Shafer fusion method allows accurate labeling by exploiting the merits of the two methods. Experiments are performed using the open data set from Lending Club. The accuracy of the proposed method is improved by about 10% against that of the model using only labeled data, and more accurate labeling can be performed through the proposed ensemble method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call