Predicting the probability of customer churn is an important reference for formulating and implementing customer retention strategies. Compared with single classification method, ensemble learning method can obtain better generalization ability and ensemble learning-based customer churn prediction method has gradually become a research focus. However, the prediction data of customer churn is usually imbalanced in class, and there are problems of overlapping and class-imbalance, so the prediction effect of ensemble learning model is ineffective. Therefore, an ensemble learning method based on adaptive clustering mixed-sampling (CUS-Ensemble) is proposed. This method is based on Bagging and regarded the gradient boosting decision tree as classifier, Firstly, cleanse the date of the whole training set, Secondly, the clustering under-sampling method is used to sample majority class instances, so that the sampled majority class instances are slightly more than the minority class instances, then, the sampled majority class instances and all minority class instances form a new training subset and provide a data cleansing again, next, using Borderline-SMOTE to balance the cleaned subset, and input it into the decision tree. Finally, the output of several gradient boosting decision trees is integrated as the final prediction result. Experimental results on six imbalanced customer datasets show that, compared with EasyEnsemble and other comparison methods, the AUC and Recall value of this method are increased by 4% and 13.6% on average, which is helpful to reduce the losses caused by customer churn.
Read full abstract