Spatial-aware topic-driven-based image Chinese caption for disaster news.

Jinfei Zhou,Cheng Yang,Yana Zhang,Yaping Zhu,Hong Pan

doi:10.1007/s00521-022-08072-w

Abstract

Automatically generating descriptions for disaster news images could effectively accelerate the spread of disaster message and lighten the burden of news editors from tedious news materials. Image caption algorithms are remarkable for generating captions directly from the content of the image. However, current image caption algorithms trained on existing image caption datasets fail to describe the disaster images with fundamental news elements. In this paper, we developed a large-scale disaster news image Chinese caption dataset (DNICC19k), which collected and annotated enormous news images related to disaster. Furthermore, we proposed a spatial-aware topic driven caption network (STCNet) to encode the interrelationships between these news objects and generate descriptive sentences related to news topics. STCNet firstly constructs a graph representation based on objects feature similarity. The graph reasoning module uses the spatial information to infer the weights of aggregated adjacent nodes according to a learnable Gaussian kernel function. Finally, the generation of news sentences are driven by the spatial-aware graph representations and the news topics distribution. Experimental results demonstrate that STCNet trained on DNICC19k could not only automatically creates descriptive sentences related to news topics for disaster news images, but also outperforms benchmark models such as Bottom-up, NIC, Show attend and AoANet on multiple evaluation metrics, achieving CIDEr/BLEU-4 scores of 60.26 and 17.01, respectively.

Full Text