Resolving Class Imbalance and Feature Selection in Customer Churn Dataset

Aamer Hanif,Noor Azhar

doi:10.1109/fit.2017.00022

Abstract

Churn prediction datasets pertaining to telecom sector often have the class imbalance problem. Due to large number of features, dimensionality reduction (or feature selection) and dataset balancing become important data preprocessing steps. This research utilizes a real dataset to classify defecting customers in the telecom sector. Three different feature selection and dataset balancing techniques are applied for data preprocessing before classification model building. The results show that random oversampling performed better to balance the dataset and the three feature selection techniques used performed equally well. Customer call related features are extracted as features that are more important. The classification model is built using random forest technique and model evaluation measures are computed and reported. Conduct of experiments on a real dataset that does not have any customer demographic variables is a significant contribution of this paper.

Full Text