Abstract Churn risk is one of the most worrying issues in the telecommunications industry. The methods for predicting churn have been improved to a great extent by the remarkable developments in the word of artificial intelligence and machine learning. In this context, a comparative study of four machine learning models was conducted. The first phase consists of data preprocessing, followed by feature analysis. In the third phase, feature selection. Then, the data is split into the training set and the test set. During the prediction phase, some of the commonly used predictive models were adopted, namely k-nearest neighbor, logistic regression, random forest, and support vector machine. Furthermore, we used cross-validation on the training set for hyperparameter adjustment and for avoiding model overfitting. Next, the hyperparameters were adjusted to increase the models' performance. The results obtained on the test set were evaluated using the feature weights, confusion matrix, accuracy score, precision, recall, error rate, and f1 score. Finally, it was found that the support vector machine model outperformed the other prediction models with an accuracy equal to 96.92%. Keywords: Churn Prediction, Classification Algorithms, Hyperparameter Optimization, Machine Learning, Telecommunications.
Read full abstract