Image Captioning with multi-level similarity-guided semantic matching

Jiesi Li,Ning Xu,Weizhi Nie,Shenyuan Zhang

doi:10.1016/j.visinf.2021.11.003

Abstract

Image Captioning is a cross-modal task that needs to automatically generate coherent natural sentences to describe the image contents. Due to the large gap between vision and language modalities, most of the existing methods have the problem of inaccurate semantic matching between images and generated captions. To solve the problem, this paper proposes a novel multi-level similarity-guided semantic matching method for image captioning, which can fuse local and global semantic similarities to learn the latent semantic correlation between images and generated captions. Specifically, we extract the semantic units containing fine-grained semantic information of images and generated captions, respectively. Based on the comparison of the semantic units, we design a local semantic similarity evaluation mechanism. Meanwhile, we employ the CIDEr score to characterize the global semantic similarity. The local and global two-level similarities are finally fused using the reinforcement learning theory, to guide the model optimization to obtain better semantic matching. The quantitative and qualitative experiments on large-scale MSCOCO dataset illustrate the superiority of the proposed method, which can achieve fine-grained semantic matching of images and generated captions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Visual Informatics	Publication Date: Dec 1, 2021
Citations: 5	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Image Captioning with multi-level similarity-guided semantic matching

Abstract

Talk to us

Similar Papers

More From: Visual Informatics

Lead the way for us

Similar Papers

Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity
Zhixin Li ... Canlong Zhang
-
Zhixin Li, et. al.Zhixin Li ... Canlong Zhang
01 Oct 2019
01 Oct 2019

A Collaborative Filtering Framework Based on Both Local User Similarity and Global User Similarity
Heng Luo ... Ruimin Shen
-
Heng Luo, et. al.Heng Luo ... Ruimin Shen
15 Sep 2008
15 Sep 2008

High-quality binary fringe patterns generation by combining optimization on global and local similarity
Feng Lu ... Jikun Yang
Journal of the European Optical Society-Rapid Publications | VOL. 14
Feng Lu, et. al.Feng Lu ... Jikun Yang
23 Apr 2018
Journal of the European Optical Society-Rapid Publications | VOL. 14

Sequence Clustering Algorithms Based on Global and Local Similarity
Dong-Bo Dai ... Yun Xiong
Journal of Software | VOL. 21
Dong-Bo Dai, et. al.Dong-Bo Dai ... Yun Xiong
11 Mar 2010
Journal of Software | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image Captioning with multi-level similarity-guided semantic matching

Abstract

Talk to us

Similar Papers

More From: Visual Informatics