Owing to the widespread use of smartphones and mobile devices and the prevalence of image-sharing social network services, the amount of image data available on the Web is soaring. Various tasks, such as image classification, detection, and segmentation, use tremendous amounts of image data to train machine learning models. Using these trained models, a visual feature representation vector can be extracted from individual images and subsequently be used in several applications, such as image retrieval, object detection, and clustering. However, despite the increasing demand for such analyses, few studies have analyzed the information summarized by such image datasets, especially for extracting topics, trends, and opinions from images generated by online communities. Therefore, we propose a novel approach to image topic modeling, which accounts for visual content as well as semantic information by leveraging the image captioning model. In addition, we propose an image–caption scoring model that measures the semantic similarity between an image and its generated caption in order to filter noisy data that obstruct analysis by obscuring the semantic meaning of topics extracted from the dataset. The results show that our proposed method assists in analyzing large-scale image datasets without the need to manually check individual images. Further experimental results show that our methods are particularly beneficial for applications such as data visualization, image retrieval, and image tag recommendation in the realm of large-scale image dataset analysis.
Read full abstract