Credit fraud modeling is an important topic covered by researchers. Overdue risk management is a critical business link in providing credit loan services. It directly impacts the rate of return and the bad debt percentage of lending organizations in this sector. Credit financial services have benefited the general public as a result of the development of the mobile Internet, and overdue risk control has evolved from the manual judgment that relied on rules in the past to a credit model built using a large amount of customer data to predict the likelihood of customers becoming delinquent. When creating a credit rating model, the emerging nature of the credit samples makes the minority class sample score very few; that is, when a large number of actual samples are obtained, this causes machine learning models to be biased towards the majority class when training. Traditional data balancing methods can reduce the bias of models to the majority category when the data is relatively unbalanced rather than excessive. Gradient boosting algorithms (XGBoost and CatBoost) are proposed in this paper to model highly unbalanced data to detect credit fraud. To find hyperparameters and determine the accuracy of the minority class as an optimization function of the model, Bayesian optimization is used to increase the model's accuracy for the minority class. The paper was tested with real European credit card fraud data. The results were compared to traditional machine learning (decision trees and logistic regression) and the performance of the bagging algorithm (random forest). For comparison, the traditional data balancing method (Oversample) is used
Read full abstract