MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection

Ghizlane Hnini,Ali Yahyaouy,Hamid Tairi,Jamal Riffi,Mohamed Adnane Mahraz

doi:10.3390/app112411968

Abstract

Hybrid spam is an undesirable e-mail (electronic mail) that contains both image and text parts. It is more harmful and complex as compared to image-based and text-based spam e-mail. Thus, an efficient and intelligent approach is required to distinguish between spam and ham. To our knowledge, a small number of studies have been aimed at detecting hybrid spam e-mails. Most of these multimodal architectures adopted the decision-level fusion method, whereby the classification scores of each modality were concatenated and fed to another classification model to make a final decision. Unfortunately, this method not only demands many learning steps, but it also loses correlation in mixed feature space. In this paper, we propose a deep multimodal feature-level fusion architecture that concatenates two embedding vectors to have a strong representation of e-mails and increase the performance of the classification. The paragraph vector distributed bag of words (PV-DBOW) and the convolutional neural network (CNN) were used as feature extraction techniques for text and image parts, respectively, of the same e-mail. The extracted feature vectors were concatenated and fed to the random forest (RF) model to classify a hybrid e-mail as either spam or ham. The experiments were conducted on three hybrid datasets made using three publicly available corpora: Enron, Dredze, and TREC 2007. According to the obtained results, the proposed model provides a higher accuracy of 99.16% compared to recent state-of-the-art methods.

Highlights

Spamming, which is defined as the behavior of sending unsolicited messages to a large number of people, is currently spreading rapidly
The support vector machine (SVM), classical k-NN, and MMA-MF models achieved an accuracy of 98.25%, 97.83%, and 98.42%, whereas our model achieved the best accuracy with 99.16%
This paper proposed a multimodal architecture based on paragraph vector distributed bag of words (PV-DBOW) and convolutional neural network (CNN) models for hybrid spam e-mail detection

Summary

Introduction

Spamming, which is defined as the behavior of sending unsolicited messages to a large number of people, is currently spreading rapidly. [4] fused two classification probability scores These values were generated from image and text parts by consecutively using a convolutional neural network (CNN) and a long short-term memory (LSTM) model. The second architecture applied decision-level fusion to classify an e-mail as either spam or ham. The main contribution of this paper is the proposition of a new multimodal architecture based on PV-DBOW, CNN, and RF It consists of generating feature vectors from both text and image of the same e-mail by consecutively using the PV-DBOW and CNN models. The two generated vectors are concatenated at the feature level before feeding them into the RF model to classify an e-mail as either spam or ham.

Related Works

Text-Based Feature Extraction Techniques

Image-Based Feature Extraction Techniques

The PV-DBOW Model

The CNN Model

Random Forest Classifier

Dataset

The MMPC-RF Architecture

Experimental Results and Comparative Study

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 16, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Combined Transfer Learning and Test-Time Augmentation Improves Convolutional Neural Network-Based Semantic Segmentation of Prostate Cancer from Multi-Parametric MR Images
David Hoar ... Sharon E Clarke
Computer Methods and Programs in Biomedicine | VOL. 210
David Hoar, et. al.David Hoar ... Sharon E Clarke
28 Aug 2021
Computer Methods and Programs in Biomedicine | VOL. 210

Protein Secondary Structure Prediction Using CNN and Random Forest
Ying Xu ... Jinyong Cheng
-
Ying Xu, et. al.Ying Xu ... Jinyong Cheng
01 Jan 2020
01 Jan 2020

Hyperspectral imaging and deep learning for quantification of Clostridium sporogenes spores in food products using 1D- convolutional neural networks and random forest model
Aswathi Soni ... Gale Brightwell
Food Research International | VOL. 147
Aswathi Soni, et. al.Aswathi Soni ... Gale Brightwell
30 Jun 2021
Food Research International | VOL. 147

Seeing the Forest for the Trees: Random Forest Models for Predicting Survival in Kidney Transplant Recipients.
Ruth Sapir-Pichhadze ... Bruce Kaplan
Transplantation | VOL. 104
Ruth Sapir-Pichhadze, et. al.Ruth Sapir-Pichhadze ... Bruce Kaplan
01 May 2020
Transplantation | VOL. 104

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences