Abstract

The feature analysis of fraudulent websites is of great significance to the combat, prevention and control of telecom fraud crimes. Aiming to address the shortcomings of existing analytical approaches, i.e. single dimension and venerability to anti-reconnaissance, this paper adopts the Stacking, the ensemble learning algorithm, combines multiple modalities such as text, image and URL, and proposes a multimodal fraudulent website identification method by ensembling heterogeneous models. Cross-validation is first used in the training of multiple largely different base classifiers that are strong in learning, such as BERT model, residual neural network (ResNet) and logistic regression model. Classification of the text, image and URL features are then performed respectively. The results of the base classifiers are taken as the input of the meta-classifier, and the output of which is eventually used as the final identification. The study indicates that the fusion method is more effective in identifying fraudulent websites than the single-modal method, and the recall is increased by at least 1%. In addition, the deployment of the algorithm to the real Internet environment shows the improvement of the identification accuracy by at least 1.9% compared with other fusion methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.