Abstract

The feature analysis of fraudulent websites is of great significance to the combat, prevention and control of telecom fraud crimes. Aiming to address the shortcomings of existing analytical approaches, i.e. single dimension and venerability to anti-reconnaissance, this paper adopts the Stacking, the ensemble learning algorithm, combines multiple modalities such as text, image and URL, and proposes a multimodal fraudulent website identification method by ensembling heterogeneous models. Cross-validation is first used in the training of multiple largely different base classifiers that are strong in learning, such as BERT model, residual neural network (ResNet) and logistic regression model. Classification of the text, image and URL features are then performed respectively. The results of the base classifiers are taken as the input of the meta-classifier, and the output of which is eventually used as the final identification. The study indicates that the fusion method is more effective in identifying fraudulent websites than the single-modal method, and the recall is increased by at least 1%. In addition, the deployment of the algorithm to the real Internet environment shows the improvement of the identification accuracy by at least 1.9% compared with other fusion methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call