Enhancing Multimodal Clustering Framework with Deep Learning to Reveal Image Spam Authorship

Wei-Bang Chen,Yongjin Lu,Zanyah Ailsworth,Chengcui Zhang,Xiaoliang Wang

doi:10.1109/iri51335.2021.00032

Abstract

This paper introduces a multimodal framework for clustering spam images received in unsolicited emails. Spam images in the same cluster have similar visual and textual contents and could be generated by a common spam source. To perform the clustering task, we first extract three main categories of features: 1) Visual features, extracted by pretrained convolutional neural networks (CNNs); 2) Layout features, the location of illustrations in the spam images; 3) Text features extracted by optical character recognition (OCR) algorithm. We then use a two-stage hierarchical clustering framework to form clusters based on the pair-wise similarity matrices of the extracted features. We evaluate the performance of the proposed approach on a 2,100 spam image dataset collected from three months of emails. The experimental results show that the proposed method achieved satisfactory clustering outcomes in terms of an external entropy-based metric, the V-measure.

Full Text