Abstract

Bank loans are a widely used means of payment in recent times, more and more people are accessing products such as credit cards, loans, etc. Banks have implemented classic prediction models, the vast majority based on logistic regression since it allows great interpretability for the business and the effect of the model variables. The purpose of this research is to perform a predictive analysis on the probability of customer default in the credit card portfolio using a risk score. The dataset used is the so-called default of credit card clients Data Set from the UCI Machine Learning DB, the approach is quantitative and the methodology is descriptive analytics, techniques based on gradient boosters will be used to make the prediction, among the trained algorithms We have Logistic Regression with WOE, CatBoost, As a result, the light gradient enhancement machine (LightGBM) tuned with a Bayesian search was obtained, obtaining a GINI of 57.4, which improves by +6 points to the Logistic Regression with Woe and by +3p to XgBoost and CatBoost. Finally, obtaining the Gain and Shapley values made up for the lack of interpretability of the variables, allowing better decision making when evaluating clients. Likewise, as future work, it is intended to add unstructured variables that allow the Model's indicators to be improved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.