Abstract
AbstractPeer‐to‐peer (P2P) lending is facing severe information asymmetry problems and depends highly on the internal credit scoring system. This paper provides a novel credit scoring model, which forecasts the probability of default for each applicant and guides the lenders' decision‐making in P2P lending. The proposal is expected to improve the existing credit scoring models in P2P lending from two aspects, namely the classifier and the usage of narrative data. We utilize an advanced gradient boosting decision tree technique (i.e., CatBoost) to predict default loans. Moreover, a soft information extraction technique based on keyword clustering is developed to compensate for the insufficient hard credit data. Validated on three real‐world datasets, the experimental results demonstrate that variables extracted from narrative data are powerful features, and the utilization of narrative data significantly improves the predictability relative to solely using hard information. The results of sensitivity analysis reveal that CatBoost outperforms the industry benchmark under different cluster numbers of extracted soft information; meanwhile a small number of clusters (e.g., three) is preferred for consideration of model performance, computational cost, and comprehensibility. We finally facilitate a discussion on practical implication and explanatory considerations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.