Abstract

E-mail is an efficient and reliable data exchange service. Spams are undesired e-mail messages which are randomly sent in bulk usually for commercial aims. Obfuscated image spamming is one of the new tricks to bypass text-based and Optical Character Recognition (OCR)-based spam filters. Image spam detection based on image visual features has the advantage of efficiency in terms of reducing the computational cost and improving the performance. In this paper, an image spam detection schema is presented. Suitable image processing techniques were used to capture the image features that can differentiate spam images from non-spam ones. Weighted k-nearest neighbor, which is a simple, yet powerful, machine learning algorithm, was used as a classifier. The results confirm the effectiveness of the proposed schema as it is evaluated over two datasets. The first dataset is a real and benchmark dataset while the other is a real-like, modern, and more challenging dataset collected from social media and many public available image spam datasets. The obtained accuracy was 99.36% and 91% on benchmark and the proposed dataset, respectively.

Highlights

  • E-mail is a reliable and popular communication medium that provides a free, or very cheap, and fast service

  • The term False Positive (FP) represents the number of non-spam e-mails identified as spam, while False Negative (FN) is the number of spam e-mails that are miss-classified as non-spam [2]

  • Accuracy is given in terms of True Positive (TP), FP, True Negative (TN) and FN as Accuracy = TP + TN / (TP +TN+ FP + FN)

Read more

Summary

Introduction

E-mail is a reliable and popular communication medium that provides a free, or very cheap, and fast service. Spammers developed new tricks, such as image spam where the spam text is embedded within an image. As a simple solution for image spam, OCR was used to convert an image’s textual content into plaintext format, and keyword-based filters can be used to identify spam from non-spam (or ham) texts. To make OCR useless, spammers use obfuscation tricks (such as adding noise, complex background, etc.) with a goal of making the spam image readable by humans but unreadable by machine. This has led to a new generation of spam filters based on image visual characteristics [4].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.