XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring

Chao Qin,Fangxun Bao,Peipei Liu,Peide Liu,Caiming Zhang,Yunfeng Zhang,Sotiris B Kotsiantis

doi:10.1155/2021/6655510

Abstract

Personal credit scoring is a challenging issue. In recent years, research has shown that machine learning has satisfactory performance in credit scoring. Because of the advantages of feature combination and feature selection, decision trees can match credit data which have high dimension and a complex correlation. Decision trees tend to overfitting yet. eXtreme Gradient Boosting is an advanced gradient enhanced tree that overcomes its shortcomings by integrating tree models. The structure of the model is determined by hyperparameters, which is aimed at the time-consuming and laborious problem of manual tuning, and the optimization method is employed for tuning. As particle swarm optimization describes the particle state and its motion law as continuous real numbers, the hyperparameter applicable to eXtreme Gradient Boosting can find its optimal value in the continuous search space. However, classical particle swarm optimization tends to fall into local optima. To solve this problem, this paper proposes an eXtreme Gradient Boosting credit scoring model that is based on adaptive particle swarm optimization. The swarm split, which is based on the clustering idea and two kinds of learning strategies, is employed to guide the particles to improve the diversity of the subswarms, in order to prevent the algorithm from falling into a local optimum. In the experiment, several traditional machine learning algorithms and popular ensemble learning classifiers, as well as four hyperparameter optimization methods (grid search, random search, tree-structured Parzen estimator, and particle swarm optimization), are considered for comparison. Experiments were performed with four credit datasets and seven KEEL benchmark datasets over five popular evaluation measures: accuracy, error rate (type I error and type II error), Brier score, and F 1 score. Results demonstrate that the proposed model outperforms other models on average. Moreover, adaptive particle swarm optimization performs better than the other hyperparameter optimization strategies.

Highlights

Granting potential borrower loans is the core operation of lending establishments around the world. e loan business brings huge profits to a company, while it enables the company to face a huge financial loss. erefore, lending institutions need to comprehensively analyse the basic information and credit histories of applicants to estimate the possibility of repayment and decide whether to approve the application.An acceptable credit scoring method can help lending institutions distinguish good applicants from loan applications and reject the unacceptable applications
neural network (NN) is a model of information processing that uses structures similar to synaptic connections in the brain, which will be improved by iteratively adjusting weights to minimize the error of prediction. e capability of NN in treating nonlinear data is beneficial to identifying intrinsic patterns in complex financial credit data
Chuang and Huang [4] proposed a hybrid credit score model based on an NN, in which the first part of the model divides applications into the accepted group and rejected group. e results show that the Mathematical Problems in Engineering model obtains a more accurate result than the other compared methods; it has been proven to reverse potential customer churn. e proposed model reinforced NN primarily to enhance the accuracy without much help in reducing misclassification

Summary

Introduction

Granting potential borrower loans is the core operation of lending establishments around the world. e loan business brings huge profits to a company, while it enables the company to face a huge financial loss. erefore, lending institutions need to comprehensively analyse the basic information and credit histories of applicants to estimate the possibility of repayment and decide whether to approve the application. E results indicated satisfactory predictability of the model These methods add different models to improve the performance of NN, NN still has limitations: it lacks the explanation capability of making lending decisions, performing time-consuming, and overfitting [6]. Based on the basis of the “no free lunch” theorem [12] and as the structure and characteristics of changeable credit data are different, the prediction accuracy is greatly limited by a single classifier. On datasets of German, Australian, and P2P, the ensemble model is better than single classifiers (LR, SVM, and DT) in accuracy, AUC, and Brier score. Shen et al [25] proposed an ensemble model by combining the AdaBoost method with the NN base classifier and employed PSO to search for the optimal connection weight of the NN.

Related Work

Materials and Methods

APSO-XGBoost Credit Scoring Model

Experimental Setup