Spam Email Image Classification Based on Text and Image Features

Estqlal Hammad Dhah,Suhad A Ali,Mohammed Abdullah Naser

doi:10.1109/cas47993.2019.9075725

Abstract

Filtering of spam image-based email remains a major challenge for researchers. This paper presents a proposed work which is based on several facts such that spam images containing a large percentage of text which has characteristics or features different from other types of images. In addition to that, there is much similarity between the features of these images. These facts can be used to distinguish text regions spam images from others. A hybrid method based on combined features vector from text regions and features of the image is proposed. Two types of features are extracted. The first features extraction method is the local binary pattern (LBP) with extricating the image texture features directly, while the second is utilised to extricate features of image text regions only. The extracted features are used in individual and combination style in order to learn classifiers at the training stage. A one-class KNN classifier and two-class KNN classifier are applied separately. Each classifier was used in three fashion, with the text-regions features, with texture features in the image, and with merging both those features respectively. Experimental results showed that the appropriation of both image and text features together improves the effectiveness of the classification concerning the case in which only image or text features are used.

Full Text