Abstract

Credit scoring tools are frequently used by lenders to identify bad borrowers who cannot fully repay their liabilities. This is a classical problem of classification with imbalanced samples, where bad loans only take a small proportion of all applications. Various machine learning techniques have been applied to the prediction of default in the past few decades. In this paper, we aim to capture those early defaulted borrowers who are likely to be fraudsters on the online lending platform by using a multi-layer structured Gradient Boosted Decision Trees with Light Gradient Boosting Machines (ML-LightGBM). Due to the extremely imbalanced sample distribution and the costs of misclassification, we further apply a cost-sensitive framework to the loss function of classification models, in order to improve predictive accuracy. The empirical results, based on a sample of 1.6 million online loans, show that the proposed cost-sensitive ML-LightGBM algorithm outperforms other predictive models. This suggests that the cost-sensitive based ML-LightGBM is a promising technique for fraud detection and credit scoring.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.