Cross-modal Graph Matching Network for Image-text Retrieval

Yuhao Cheng,Jiuchao Qian,Fei Wen,Xiaoguang Zhu,Peilin Liu

doi:10.1145/3499027

Abstract

Image-text retrieval is a fundamental cross-modal task whose main idea is to learn image-text matching. Generally, according to whether there exist interactions during the retrieval process, existing image-text retrieval methods can be classified into independent representation matching methods and cross-interaction matching methods. The independent representation matching methods generate the embeddings of images and sentences independently and thus are convenient for retrieval with hand-crafted matching measures (e.g., cosine or Euclidean distance). As to the cross-interaction matching methods, they achieve improvement by introducing the interaction-based networks for inter-relation reasoning, yet suffer the low retrieval efficiency. This article aims to develop a method that takes the advantages of cross-modal inter-relation reasoning of cross-interaction methods while being as efficient as the independent methods. To this end, we propose a graph-based Cross-modal Graph Matching Network (CGMN) , which explores both intra- and inter-relations without introducing network interaction. In CGMN, graphs are used for both visual and textual representation to achieve intra-relation reasoning across regions and words, respectively. Furthermore, we propose a novel graph node matching loss to learn fine-grained cross-modal correspondence and to achieve inter-relation reasoning. Experiments on benchmark datasets MS-COCO, Flickr8K, and Flickr30K show that CGMN outperforms state-of-the-art methods in image retrieval. Moreover, CGMM is much more efficient than state-of-the-art methods using interactive matching. The code is available at https://github.com/cyh-sj/CGMN .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-modal Graph Matching Network for Image-text Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Mar 4, 2022
Citations: 55

Similar Papers

Image Retrieval and Classification Method Based on Euclidian Distance Between Normalized Features Including Wavelet Descriptor
Kohei Arai
International Journal of Advanced Research in Artificial Intelligence | VOL. 2
Kohei AraiKohei Arai
01 Jan 2013
International Journal of Advanced Research in Artificial Intelligence | VOL. 2

Cross-modal independent matching network for image-text retrieval
Xiao Ke ... Wenzhong Guo
Pattern Recognition | VOL. 159
Xiao Ke, et. al.Xiao Ke ... Wenzhong Guo
01 Mar 2025
Pattern Recognition | VOL. 159

A novel biomedical image indexing and retrieval system via deep preference learning
Shuchao Pang ... Zhezhou Yu
Computer Methods and Programs in Biomedicine | VOL. 158
Shuchao Pang, et. al.Shuchao Pang ... Zhezhou Yu
06 Feb 2018
Computer Methods and Programs in Biomedicine | VOL. 158

Late fusion of heterogeneous methods for multimedia image retrieval
Hugo Jair Escalante ... Luis Enrique Sucar
-
Hugo Jair Escalante, et. al.Hugo Jair Escalante ... Luis Enrique Sucar
30 Oct 2008
30 Oct 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-modal Graph Matching Network for Image-text Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications