Abstract

The digital economy is booming, but cybercrimes and telecommunication frauds are emerging one after another. How to detect fraudulent behaviours and prevent the occurrence of crimes is a significant challenge. This paper mainly conducts data mining and analysis on the bank card telecommunication fraud data set, first of all, data mining and feature engineering for the given data set, including analyzing the data integrity, the overall statistical analysis of the data and standardizing the data using the Z-Score standardization method, Use the Pearson correlation coefficient to explore the feature correlation, use the SMOTE method to balance the data set, and finally divide the training set and the test set. Subsequently, four machine learning classification models, including the logistic regression classification model, KNN classification model, decision tree classification model and XGBoost classification model, were established to predict and classify fraudulent behaviours preliminarily. To further mine the data set of bank card telecommunication fraud, the optimal solutions of the models are obtained by grid tuning and cross-validation for the four established models. After experiments, the logistic regression classification model, KNN classification model, decision tree classification model and XGBoost classification The prediction accuracy rates of the model in the test set are 93.45%, 99.85%, 99.92%, and 99.94%, respectively. It is preliminarily believed that the XGBoost and decision tree classification models have excellent classification capabilities. Use the obtained four optimal models to calculate the three performance evaluation indicators of prediction accuracy, recall rate and F1 value in the test set, respectively, and further evaluate the four machine learning models. Through comparative analysis, the XGBoost classification model has the best performance. Due to its classification ability, strong generalization ability and robustness, it is selected as the final bank card telecommunication fraud prediction model. In addition, the P-R curve and ROC curve of the classification results are drawn using the performance evaluation indicators to be intuitive. Analysis of the model's performance further shows that XGBoost has better generalization ability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.