Customer churn remains a critical concern for businesses, highlighting the significance of retaining existing customers over acquiring new ones. Effective prediction of potential churners aids in devising robust retention policies and efficient customer management strategies. This study dives into the realm of machine learning algorithms for predictive analysis in churn prediction, addressing the inherent challenge posed by diverse and imbalanced customer churn data distributions. This paper introduces a novel approach—the Ratio-based data balancing technique, which addresses data skewness as a pre-processing step, ensuring improved accuracy in predictive modelling. This study fills gaps in existing literature by highlighting the effectiveness of ensemble algorithms and the critical role of data balancing techniques in optimizing churn prediction models. While our research contributes a novel approach, there remain avenues for further exploration. This work evaluates several machine learning algorithms—Perceptron, Multi-Layer Perceptron, Naive Bayes, Logistic Regression, K-Nearest Neighbour, Decision Tree, alongside Ensemble techniques such as Gradient Boosting and Extreme Gradient Boosting (XGBoost)—on balanced datasets achieved through our proposed Ratio-based data balancing technique and the commonly used Data Resampling. Results reveal that our proposed Ratio-based data balancing technique notably outperforms traditional Over-Sampling and Under-Sampling methods in churn prediction accuracy. Additionally, using combined algorithms like Gradient Boosting and XGBoost showed better results than using single methods. Our study looked at different aspects like Accuracy, Precision, Recall, and F-Score, finding that these combined methods are better for predicting customer churn. Specifically, when we used a 75:25 ratio with the XGBoost method, we got the most promising results for our analysis which are presented in this work.
Read full abstract