MEDAL: A Multimodality-Based Effective Data Augmentation Framework for Illegal Website Identification

Li Wen,Chenyang Wang,Huimin Ma,Pengfei Xue,Wanmeng Ding,Min Zhang,Bingyang Guo,Jinghua Zheng

doi:10.3390/electronics13112199

Abstract

The emergence of illegal (gambling, pornography, and attraction) websites seriously threatens the security of society. Due to the concealment of illegal websites, it is difficult to obtain labeled data with high quantity. Meanwhile, most illegal websites usually disguise themselves to avoid detection; for example, some gambling websites may visually resemble gaming websites. However, existing methods ignore the means of camouflage in a single modality. To address the above problems, this paper proposes MEDAL, a multimodality-based effective data augmentation framework for illegal website identification. First, we established an illegal website identification framework based on tri-training that combines information from different modalities (including image, text, and HTML) while making full use of numerous unlabeled data. Then, we designed a multimodal mutual assistance module that is integrated with the tri-training framework to mitigate the introduction of error information resulting from an unbalanced single-modal classifier performance in the tri-training process. Finally, the experimental results on the self-developed dataset demonstrate the effectiveness of the proposed framework, performing well on accuracy, precision, recall, and F1 metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MEDAL: A Multimodality-Based Effective Data Augmentation Framework for Illegal Website Identification

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Jun 5, 2024
License type: CC BY 4.0

Similar Papers

Multi-objective evolutionary optimization using the relationship between F1 and accuracy metrics in classification tasks
Juan Carlos Fernández ... César Hervás-Martínez
Applied Intelligence | VOL. 49
Juan Carlos Fernández, et. al.Juan Carlos Fernández ... César Hervás-Martínez
13 Apr 2019
Applied Intelligence | VOL. 49

A Labeling Intercomparison of Retrogressive Thaw Slumps by a Diverse Group of Domain Experts
Ingmar Nitze ... Anna K Liljedahl
Permafrost and Periglacial Processes | VOL. -
Ingmar Nitze, et. al.Ingmar Nitze ... Anna K Liljedahl
20 Oct 2024
Permafrost and Periglacial Processes | VOL. -

Abstract 4970: Multi-modal machine learning approaches for predicting cancer type and Gleason grade leveraging public TCGA data
Christian Wohlfart ... Eldad Klaiman
Cancer Research | VOL. 84
Christian Wohlfart, et. al.Christian Wohlfart ... Eldad Klaiman
22 Mar 2024
Cancer Research | VOL. 84

An Anomaly Detection Mechanism for IEC 60870-5-104
Panagiotis Radoglou Grammatikis ... Panagiotis Sarigiannidis
-
Panagiotis Radoglou Grammatikis, et. al.Panagiotis Radoglou Grammatikis ... Panagiotis Sarigiannidis
01 Sep 2020
An Anomaly Detection Mechanism for IEC 60870-5-104
Panagiotis Radoglou Grammatikis ... Panagiotis Sarigiannidis

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MEDAL: A Multimodality-Based Effective Data Augmentation Framework for Illegal Website Identification

Abstract

Talk to us

Similar Papers

More From: Electronics