Abstract

Hybrid spam is an undesirable e-mail (electronic mail) that contains both image and text parts. It is more harmful and complex as compared to image-based and text-based spam e-mail. Thus, an efficient and intelligent approach is required to distinguish between spam and ham. To our knowledge, a small number of studies have been aimed at detecting hybrid spam e-mails. Most of these multimodal architectures adopted the decision-level fusion method, whereby the classification scores of each modality were concatenated and fed to another classification model to make a final decision. Unfortunately, this method not only demands many learning steps, but it also loses correlation in mixed feature space. In this paper, we propose a deep multimodal feature-level fusion architecture that concatenates two embedding vectors to have a strong representation of e-mails and increase the performance of the classification. The paragraph vector distributed bag of words (PV-DBOW) and the convolutional neural network (CNN) were used as feature extraction techniques for text and image parts, respectively, of the same e-mail. The extracted feature vectors were concatenated and fed to the random forest (RF) model to classify a hybrid e-mail as either spam or ham. The experiments were conducted on three hybrid datasets made using three publicly available corpora: Enron, Dredze, and TREC 2007. According to the obtained results, the proposed model provides a higher accuracy of 99.16% compared to recent state-of-the-art methods.

Highlights

  • Spamming, which is defined as the behavior of sending unsolicited messages to a large number of people, is currently spreading rapidly

  • The support vector machine (SVM), classical k-NN, and MMA-MF models achieved an accuracy of 98.25%, 97.83%, and 98.42%, whereas our model achieved the best accuracy with 99.16%

  • This paper proposed a multimodal architecture based on paragraph vector distributed bag of words (PV-DBOW) and convolutional neural network (CNN) models for hybrid spam e-mail detection

Read more

Summary

Introduction

Spamming, which is defined as the behavior of sending unsolicited messages to a large number of people, is currently spreading rapidly. [4] fused two classification probability scores These values were generated from image and text parts by consecutively using a convolutional neural network (CNN) and a long short-term memory (LSTM) model. The second architecture applied decision-level fusion to classify an e-mail as either spam or ham. The main contribution of this paper is the proposition of a new multimodal architecture based on PV-DBOW, CNN, and RF It consists of generating feature vectors from both text and image of the same e-mail by consecutively using the PV-DBOW and CNN models. The two generated vectors are concatenated at the feature level before feeding them into the RF model to classify an e-mail as either spam or ham.

Related Works
Text-Based Feature Extraction Techniques
Image-Based Feature Extraction Techniques
The PV-DBOW Model
The CNN Model
Random Forest Classifier
Dataset
The MMPC-RF Architecture
Experimental Results and Comparative Study
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call