Image-Text Retrieval With Cross-Modal Semantic Importance Consistency

Zejun Liu,Fanglin Chen,Jun Xu,Wenjie Pei,Guangming Lu

doi:10.1109/tcsvt.2022.3220297

Abstract

Cross-modal image-text retrieval is an important area of Vision-and-Language task that models the similarity of image-text pairs by embedding features into a shared space for alignment. To bridge the heterogeneous gap between the two modalities, current approaches achieve inter-modal alignment and intra-modal semantic relationship modeling through complex weighted combinations between items. In the intra-modal association and inter-modal interaction processes, the higher-weight items have a higher contribution to the global semantics. However, the same item always produces different contributions in the two processes, since most traditional approaches only focus on the alignment. This usually results in semantic changes and misalignment. To address this issue, this paper proposes Cross-modal Semantic Importance Consistency (CSIC) which achieves invariance in the semantic of items during aligning. The proposed technique measures the semantic importance of items obtained from intra-modal and inter-modal self-attention and learns a more reasonable representation vector by inter-calibrating the importance distribution to improve performance. We conducted extensive experiments on the Flickr30K and MS COCO datasets. The results show that our approach can significantly improve retrieval performance, proving the proposed approach’s superiority and rationality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image-Text Retrieval With Cross-Modal Semantic Importance Consistency

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: May 1, 2023
Citations: 7

Similar Papers

Learning Relation Alignment for Calibrated Cross-modal Retrieval
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment
Po-Yao Huang ... Wenhe Liu
-
Po-Yao Huang, et. al.Po-Yao Huang ... Wenhe Liu
15 Oct 2019
15 Oct 2019

Improving What Cross-Modal Retrieval Models Learn through Object-Oriented Inter- and Intra-Modal Attention Networks
Po-Yao Huang ... Alexander G Hauptmann
-
Po-Yao Huang, et. al.Po-Yao Huang ... Alexander G Hauptmann
05 Jun 2019
05 Jun 2019

Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders
Nicola Messina ... Andrea Esuli
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 17
Nicola Messina, et. al.Nicola Messina ... Andrea Esuli
12 Nov 2021
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image-Text Retrieval With Cross-Modal Semantic Importance Consistency

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology