Abstract

Customer churn is a problem that affects firms in a variety of industries. Every time a client departs, there will be significant loss of a firm. Churn prediction refers to determining which consumers are most likely to cancel a service subscription based on how they use it. It's a crucial prediction for many firms because getting new customers is sometimes more expensive than keeping old ones. There are six faces to our suggested methodology. Data pre-processing and exploratory data analysis are performed in the first two faces. In the third phase, feature selection is considered; after that, the data is divided into two portions, train and testset, in a ratio of 80% and 20%, respectively. The most prominent prediction models, such as logistic regression, naive bayes, support vector machine, and random forests, were used on the train set, and ensemble approaches were used to evaluate how they impacted model accuracy. In addition, for hyperparameter tuning and to avoid overfitting of models, k-fold cross validation was applied. Finally, the AUC/RUC curve was used to analyse the findings obtained on the test set. Random Forest and SVM were shown to have the highest accuracy of 87 percent and 84 percent, respectively. Random Forest achieves the greatest AUC score of 94.5 percent, while SVM classifiers obtain 92.1 percent, outperforming others.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call