Abstract
Social media platforms usually contain several modified versions of an image. This proliferation of versions questions the trust of social media images. We propose a novel framework to find modified versions of social media images using only their metadata. We consider several aspects to determine if an image is a modified version of another image. These aspects include topic of an image, spatio-temporal information, and semantic similarity. We first do topic modeling to find images linked to the same context. Secondly, we perform spatio-temporal clustering to group spatio-temporally close images. Finally, we perform hierarchical clustering to form more precise clusters of versions. Notably, the proposed framework also considers modifications introduced in an image’s metadata while determining versions of the image. Modifications in social media images pose a significant challenge to correctly cluster versions together as a version may exhibit significant deviations from its original image. We address this issue by exploring inconsistencies in the image metadata. These inconsistencies are reflective of the changes in an image. We validate our model on a fact-checked image verification corpus and the Multimodal C4 dataset. We achieve around 95% accuracy, validating the effectiveness of the proposed approach.
Highlights
Social media platforms have become crucial for sharing real-life event updates [1–3]
We evaluate our framework on two real image metadata datasets: the Multimodal C4 (MMC4) dataset and the image verification corpus
Its selection is driven by its ability to harness the rich and diverse metadata provided by the Multimodal C4 dataset, ensuring
Summary
Social media platforms have become crucial for sharing real-life event updates [1–3]. The number of active social media users surpassed 4 billion by 2019 [4], and around 350 million new photos uploaded daily on Facebook [5]. These social media images may contain useful. Qijun He and Muhammad Umair contributed to this work
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have