Abstract

Telecom Companies logs customer’s actions which generate a huge amount of data that can bring important findings related to customer’s behavior and needs. The main characteristics of such data are the large number of features and the high sparsity that impose challenges to the analytics steps. This paper aims to explore dimensionality reduction on a real telecom dataset and evaluate customers’ clustering in reduced and latent space, compared to original space in order to achieve better quality clustering results. The original dataset contains 220 features that belonging to 100,000 customers. However, dimensionality reduction is an important data preprocessing step in the data mining process specially with the presence of curse of dimensionality. In particular, the aim of data reduction techniques is to filter out irrelevant features and noisy data samples. To reduce the high dimensional data, we projected it down to a subspace using well known Principal Component Analysis (PCA) decomposition and a novel approach based on Autoencoder Neural Network, performing in this way dimensionality reduction of original data. Then K-Means Clustering is applied on both-original and reduced data set. Different internal measures were performed to evaluate clustering for different numbers of dimensions and then we evaluated how the reduction method impacts the clustering task.

Highlights

  • Due to the increased competition between telecommunication operators and growing customers’ churn rate, telecommunication companies were seeking to improve customer loyalty

  • Clustering evaluation metrics Different measures were performed for the final evaluation of the clustering algorithm performance, which can be categorized into 2 main types [51]:

  • We deal with unlabeled data so we review some internal measures that we can deploy on clustering algorithms to evaluate the quality of clustering results such as:

Read more

Summary

Introduction

Due to the increased competition between telecommunication operators and growing customers’ churn rate, telecommunication companies were seeking to improve customer loyalty. In order to increase customer satisfaction, most telecom companies resort to customer segmentation which entails separating the targeted customers into different groups based on demographics or usage perspective including gender, age-group, buying behavior, usage pattern, special interests and other features that represent the customer. Big data is one of the most common topics in present studies, and its techniques are applied in different fields such as telecom industry [1] to support strategic decisions. Objects in one cluster are likely to be different when compared to objects grouped under another cluster, and it’s one of the most fundamental processes for analyzing unsupervised data, which is applied in a wide range of applications such as computer vision [2,3,4], natural language processing [5,6,7] and bioinformatics [8, 9]. A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters and attempts to infer distribution of the data

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call