As the digital economy continues to grow, the expansion of Internet finance introduces new challenges for the conventional banking industry. Banks must deal with multiple pressures, such as digital transformation, declining customer loyalty, and fintech competition. Analyzing the potential factors of bank customer churn from multiple perspectives and constructing models for predicting churn can help bank managers understand the causes of churn, identify problems, detect potential churn customers promptly, and develop efficient retention strategies based on customer characteristics and preferences. In this paper, we used a combination of visualization, data mining, and machine learning methods to analyze the factors used to predict bank customer churn from multiple perspectives, such as feature selection (Random Forest Feature Importance Ranking), feature extraction (PCA), visualization, etc. We also constructed two churn prediction models based on the gradient boosting tree algorithms, XGBoost and LightGBM, compared the evaluation measures before and after feature selection and before and after tuning parameters, and interpreted the model through SHAP methods. After the paper, the following conclusions were drawn: (1) Total Trans Amt, Total Trans Ct, and Total Revolving Bal are pivotal in analyzing and predicting customer churn; (2) the SHAP Summary Plot can react to the visual analysis of predictors of customer churn to a certain extent; (3) the effect of feature selection on the assessment of the results is sometimes insignificant; (4) tuning parameter settings can enhance model performance to a certain extent, but the optimal parameters may vary based on the preprocessing method employed. These conclusions will assist banks in comprehending customer churn factors more deeply, constructing a higher performance churn prediction model, and conducting a comprehensive result synthesis analysis.
Read full abstract