Abstract
Customer churn is a common problem faced by many industries, including telecommunication industries. This has resulted in the development of advanced techniques for the prediction and prevention of customer churn. The availability of stored customer data in the form of big data, together with the use of advanced and tuned machine learning (ML) algorithms, have paved the way for the realisation and extraction of useful features associated with customer behaviour and consequently the prediction of customer churn. An effective way to further improve churn prediction capability of different ML algorithms is through the employment of topological data analysis (TDA). TDA is a framework that applies topological methods to uncover the underlying hidden structural features in complex, high-dimensional data. Here, a TDA summary of 0- and 1-dimensional holes of the data, called barcode statistics, was extracted and used as an additional feature to the preprocessed customer data. To address issues such as the effective preprocessing and analysis of large customer datasets and the effective tuning of ML hyperparameters, we implement an advanced data preprocessing technique that consists of different stages such as handling of missing data, feature engineering, encoding of categorical features using the hashing encoding method, and feature selection. Without including barcode statistics in the model, the XGBoost algorithm with tuned hyperparameters achieved the best results, with accuracy of 92.71%, precision of 85.95%, recall of 92.71%, and F-measure of 89.20%. Including barcode statistics as an additional feature, the XGBoost algorithm with tuned hyperparameters achieved the best and much improved results, with accuracy of 98.50%, precision of 98.50%, recall of 98.50%, and F-measure of 98.50%. The use of TDA barcode statistics significantly improved the churn prediction capability of the ML algorithms. In addition, hyperparameter tuning is not needed when an effective data preprocessing technique is used, or when barcode statistics is used. The best accuracy of 98.5% from this work was in line with the best accuracy of 98.7% from a related work, but interestingly, the best precision of 98.5% from this work was superior to the 94.3% precision from the same related work with higher accuracy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.