Abstract
As peer-to-peer (P2P) lenders evaluate the potential risk of each loan application, they may rely on subjective judgement given qualitative information. Academics have found loan approval rates to be associated with the borrower's personality traits, social capital, and appearances. However, the association between a borrower's language and probability of default has yet be considered. In this paper, we show that there are statistically significant linguistic differences in the free-text Lending Club loan descriptions between those that default and those that are fully paid. By newly engineering features on non-standard language and spelling errors, using natural language processing techniques and running multivariate logistic regression analyses, we find that the usage of slang words, short-hand abbreviations, and spelling errors are all associated with a higher likelihood of default when controlling for the borrower's income and loan amount. However, whether the errors were orthographic or phonological and the egregiousness of the error do not affect the probability of default. Finally, we discuss the ethical implications of potential discriminatory bias given the association between poor spelling and disability status (e.g. dyslexia), national origin (i.e. English language familiarity), and personality traits (carelessness), laying the foundation for future work on bias in P2P lending, and other scenarios involving applicant-oriented risk assessments.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have