Analysis and Implementation of Abnormal User Data for Large Scale Communication Based on Spark

Siqin Huang,Wei Fang

doi:10.1109/dcabes.2018.00043

Abstract

Abnormal communication refers to the user in the call traffic and business management and other daily consumption of abnormal behavior in the user. Communication operators have large-scale user data sets, and using data sets reasonably to make good guidance and recommendation to businesses can bring better economic benefits. However, for large-scale user feature datasets, serial machine learning and analysis methods spend a lot of time in feature processing, and data sets training is facing a huge time cost. In order to process and train abnormal user data better and more efficiently, this paper uses Spark to implement feature engineering and analyze large-scale anomaly user datasets to highlight the efficiency of Spark in analyzing feature data and implement distributed training to accelerate the algorithm model. The training algorithm takes the SVM distributed training dataset as an example and compares it with the stand-alone serial SVM and scikit-learn SVM. The experimental results show the advantages of distributed computing as well as good training results. Finally, common logistic regression and Bayesian algorithms and other distributed computing models to compare the training effect.

Full Text