Abstract

This paper introduces a multimodal framework for clustering spam images received in unsolicited emails. Spam images in the same cluster have similar visual and textual contents and could be generated by a common spam source. To perform the clustering task, we first extract three main categories of features: 1) Visual features, extracted by pretrained convolutional neural networks (CNNs); 2) Layout features, the location of illustrations in the spam images; 3) Text features extracted by optical character recognition (OCR) algorithm. We then use a two-stage hierarchical clustering framework to form clusters based on the pair-wise similarity matrices of the extracted features. We evaluate the performance of the proposed approach on a 2,100 spam image dataset collected from three months of emails. The experimental results show that the proposed method achieved satisfactory clustering outcomes in terms of an external entropy-based metric, the V-measure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call