Customer turnover is a crucial issue in banking since maintained profitability depends on keeping clients. This work aims to categorize consumer turnover in banks by using a new ensemble approach combining many machine learning methods, hence enhancing churn prediction models. Using a comprehensive dataset including demographic, financial, and behavioral data—such as credit score, account balance, tenure, and activity levels—the study employs the goal variable revealing if a customer has left the bank. The study starts with univariate, bivariate, and multivariate feature exploration and subsequently uses the Interquartile Range (IQR) approach to identify outliers thereby improving the data quality. Five models—K-Nearest Neighbors, Support Vector Classifier, Random Forest, Decision Tree, and XGBoost—a Voting Classifier ensemble—are used to estimate project churn. Building upon all the strengths of each model, this approach improves the prediction of classification and provides a balanced and highly robust classification system. The applied approaches are K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), Random Forest, Decision Tree, and XGBoost within a Voting Classifier configuration. The performance of the Voting Classifier without SMOTE yields the following results: Accuracy: 0.87, precision: 0.87, recall: 0.80, and F1-Score: 0.87. The proposed model that extend the base model using SMOTE (Synthetic Minority Over-sampling Technique), yields a higher prediction accuracy of 0.90, precision of 0.90, recall of 0.90 and F1-Score of 0.90. This enhancement is proving the efficiency of SMOTE to handle the class imbalance problem in order to render the churn prediction more balanced and reliable system. The proposed approach assures a reliable solution to the strategies to retain the customers in the banking organisations.
Read full abstract