Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval

Tieying Li,Lingdu Kong,Bin Wang,Xiaochun Yang,Jiaxing Xu

doi:10.62762/cjif.2024.361895

Abstract

The rapid advancement of Internet technology, driven by social media and e-commerce platforms, has facilitated the generation and sharing of multimodal data, leading to increased interest in efficient cross-modal retrieval systems. Cross-modal image-text retrieval, encompassing tasks such as image query text (IqT) retrieval and text query image (TqI) retrieval, plays a crucial role in semantic searches across modalities. This paper presents a comprehensive survey of cross-modal image-text retrieval, addressing the limitations of previous studies that focused on single perspectives such as subspace learning or deep learning models. We categorize existing models into single-tower, dual-tower, real-value representation, and binary representation models based on their structure and feature representation. Additionally, we explore the impact of multimodal Large Language Models (MLLMs) on cross-modal retrieval. Our study also provides a detailed overview of common datasets, evaluation metrics, and performance comparisons of representative methods. Finally, we identify current challenges and propose future research directions to advance the field of cross-modal image-text retrieval.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval

Abstract

Talk to us

Similar Papers

More From: Chinese Journal of Information Fusion

Lead the way for us

Journal: Chinese Journal of Information Fusion	Publication Date: Jun 12, 2024
Citations: 1

Similar Papers

Deep-Learning-based Cross-Modal Luxury Microblogs Retrieval
Menghao Ma ... Wenhe Feng
-
Menghao Ma, et. al.Menghao Ma ... Wenhe Feng
11 Dec 2021
11 Dec 2021

A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
Qimin Cheng ... Peng Fu
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 14
Qimin Cheng, et. al.Qimin Cheng ... Peng Fu
01 Jan 2020
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 14

Clustering-driven Deep Adversarial Hashing for scalable unsupervised cross-modal retrieval
Xiao Shen ... Li Liu
Neurocomputing | VOL. 459
Xiao Shen, et. al.Xiao Shen ... Li Liu
29 Jun 2021
Neurocomputing | VOL. 459

Asymmetric Deep Cross-modal Hashing
Jingzi Gu ... Weiping Wang
-
Jingzi Gu, et. al.Jingzi Gu ... Weiping Wang
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval

Abstract

Talk to us

Similar Papers

More From: Chinese Journal of Information Fusion