Cross-modal independent matching network for image-text retrieval

Xiao Ke,Baitao Chen,Xiong Yang,Yuhang Cai,Hao Liu,Wenzhong Guo

doi:10.1016/j.patcog.2024.111096

Abstract

Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal cross matching methods can effectively perform cross-modal interactions with high theoretical performance. However, there is a deficiency in efficiency. Modal independent matching methods exhibit superior efficiency but lack in performance. Therefore, achieving a balance between matching efficiency and performance becomes a challenge in the field of image-text retrieval. In this paper, we propose a new Cross-modal Independent Matching Network (CIMN) for image-text retrieval. Specifically, we first use the proposed Feature Relationship Reasoning (FRR) to infer neighborhood and potential relations of modal features. Then, we introduce Graph Pooling (GP) based on graph convolutional networks to perform modal global semantic aggregation. Finally, we introduce the Gravitation Loss (GL) by incorporating sample mass into the learning process. This loss can correct the matching relationship between and within each modality, avoiding the problem of equal treatment of all samples in the traditional triplet loss. Extensive experiments on Flickr30K and MSCOCO datasets demonstrate the superiority of the proposed method. It achieves a good balance between matching efficiency and performance, surpasses other similar independent matching methods in performance, and can obtain retrieval accuracy comparable to some mainstream cross matching methods with an order of magnitude lower inference time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-modal independent matching network for image-text retrieval

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Similar Papers

Cross-modal Graph Matching Network for Image-text Retrieval
Yuhao Cheng ... Peilin Liu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18
Yuhao Cheng, et. al.Yuhao Cheng ... Peilin Liu
04 Mar 2022
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18

Semantic Completion and Filtration for Image–Text Retrieval
Song Yang ... Xuan-Ya Li
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19
Song Yang, et. al.Song Yang ... Xuan-Ya Li
27 Feb 2023
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19

Relation-Guided Network for Image-Text Retrieval
Yulou Yang ... Ming Yang
-
Yulou Yang, et. al.Yulou Yang ... Ming Yang
16 Oct 2022
16 Oct 2022

AN AERIAL-IMAGE DENSE MATCHING APPROACH BASED ON OPTICAL FLOW FIELD
Wei Yuan ... Yong Zhang
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B3
Wei Yuan, et. al.Wei Yuan ... Yong Zhang
09 Jun 2016
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-modal independent matching network for image-text retrieval

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition