Predicting loan default in peer‐to‐peer lending using narrative data

Yufei Xia,Yanlin Ding,Yinguo Li,Lingyun He,Nana Liu

doi:10.1002/for.2625

Abstract

AbstractPeer‐to‐peer (P2P) lending is facing severe information asymmetry problems and depends highly on the internal credit scoring system. This paper provides a novel credit scoring model, which forecasts the probability of default for each applicant and guides the lenders' decision‐making in P2P lending. The proposal is expected to improve the existing credit scoring models in P2P lending from two aspects, namely the classifier and the usage of narrative data. We utilize an advanced gradient boosting decision tree technique (i.e., CatBoost) to predict default loans. Moreover, a soft information extraction technique based on keyword clustering is developed to compensate for the insufficient hard credit data. Validated on three real‐world datasets, the experimental results demonstrate that variables extracted from narrative data are powerful features, and the utilization of narrative data significantly improves the predictability relative to solely using hard information. The results of sensitivity analysis reveal that CatBoost outperforms the industry benchmark under different cluster numbers of extracted soft information; meanwhile a small number of clusters (e.g., three) is preferred for consideration of model performance, computational cost, and comprehensibility. We finally facilitate a discussion on practical implication and explanatory considerations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting loan default in peer‐to‐peer lending using narrative data

Abstract

Talk to us

Similar Papers

More From: Journal of Forecasting

Lead the way for us

Journal: Journal of Forecasting	Publication Date: Aug 22, 2019
Citations: 61

Similar Papers

Bibliography
-
-
--
23 Dec 2016
23 Dec 2016

Strengthen credit scoring system of small and micro businesses with soft information: Analysis and comparison based on neural network models
Bing Li ... Binqing Xiao
Journal of Intelligent & Fuzzy Systems | VOL. 40
Bing Li, et. al.Bing Li ... Binqing Xiao
01 Jan 2020
Journal of Intelligent & Fuzzy Systems | VOL. 40

Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending
Kun Niu ... Renfa Li
Information Sciences | VOL. 536
Kun Niu, et. al.Kun Niu ... Renfa Li
28 May 2020
Information Sciences | VOL. 536

A Deep Learning Based Online Credit Scoring Model for P2P Lending
Zaimei Zhang ... Yan Liu
IEEE Access | VOL. 8
Zaimei Zhang, et. al.Zaimei Zhang ... Yan Liu
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting loan default in peer‐to‐peer lending using narrative data

Abstract

Talk to us

Similar Papers

More From: Journal of Forecasting