Abstract

The telecommunication industry always has a tough competition with its competitors to retain customers, and therefore has become one of the research sectors in machine learning and data mining. Since the customers' churn behavior is to be monitored closely and efficiently it requires for a methodical churn prediction model to monitor the customers' churn. The main setbacks in achieving the desired performances in a classifier are the enormous datasets, large feature space and imbalanced class distribution. In this work, we explore the implication of Synthetic Minority Over-sampling TEchnique (SMOTE) to reduce the imbalance in data in collaboration with different feature reduction techniques such as Co-relation feature extraction, Gain ratio, Information gain and OneR feature evaluation method. Classification and Regression Trees(CART), Bagged CART and Partial Decision Trees(PART) classifiers are trained to analyze the performance on balanced and reduced feature space dataset. Prediction performance of the classifiers is evaluated through measures such as Area Under the Curve(AUC), sensitivity and specificity. Finally, it is concluded through simulations that our proposed method based on SMOTE, co-relation, and ensembling performs well for predicting churners as against simply applying learners on the unrefined dataset. Therefore, this methodology can be helpful for the telecommunication industry to predict churn.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call