Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

Hailang Huang,Ziqiao Wang,Zhijie Nie,Ziyu Shang

doi:10.1609/aaai.v38i16.29789

Abstract

Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plug-and-play, meaning it can be easily applied to existing image-text retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24_itr_cusa.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

Integrating listwise ranking into pairwise-based image-text retrieval
Zheng Li ... Yanjun Wang
Knowledge-Based Systems | VOL. 287
Zheng Li, et. al.Zheng Li ... Yanjun Wang
23 Jan 2024
Knowledge-Based Systems | VOL. 287

Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training.
Chong Liu ... Liang Wang
IEEE Transactions on Image Processing | VOL. PP
Chong Liu, et. al.Chong Liu ... Liang Wang
01 Jan 2023
IEEE Transactions on Image Processing | VOL. PP

USER: Unified Semantic Enhancement With Momentum Contrast for Image-Text Retrieval.
Yan Zhang ... Zhong Ji
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 33
Yan Zhang, et. al.Yan Zhang ... Zhong Ji
01 Jan 2024
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 33

High-Accuracy Tomato Leaf Disease Image-Text Retrieval Method Utilizing LAFANet.
Jiaxin Xu ... Guoxiong Zhou
Plants | VOL. 13
Jiaxin Xu, et. al.Jiaxin Xu ... Guoxiong Zhou
23 Apr 2024
Plants | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence