More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

Yuxiao Chen,Dimitris N Metaxas,Larry Davis,Long Zhao,Tianlang Chen,Jianbo Yuan,Rui Luo

doi:10.1109/wacv56688.2023.00441

Abstract

Cross-modal attention mechanisms have been widely applied to the image-text matching task. They have achieved remarkable improvements thanks to their capability of learning fine-grained relevance across different modalities. However, the cross-modal attention models of existing methods could be sub-optimal and inaccurate because there is no direct supervision provided during the training process. In this work, we propose two novel training strategies, namely Contrastive Content Resourcing (CCR) and Contrastive Content Swapping (CCS) constraints, to address such limitations. These constraints supervise the training of cross-modal attention models in a contrastive learning manner without requiring explicit attention annotations. They are plug-in training strategies and can be generally integrated into existing cross-modal attention models. Additionally, we introduce three metrics, including Attention Precision, Recall, and F1-Score, to quantitatively measure the quality of learned attention models. We evaluate the proposed constraints by incorporating them into four state- of-the-art cross-modal attention-based image-text matching models. Experimental results on both Flickr30k and MS-COCO datasets demonstrate that integrating these constraints generally improves the model performance in terms of both retrieval performance and attention metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Cross-Modal Attention With Semantic Consistence for Image-Text Matching.
Xing Xu ... Yang Yang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31
Xing Xu, et. al.Xing Xu ... Yang Yang
30 Nov 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31

Hybrid Joint Embedding with Intra-Modality Loss for Image-Text Matching
Doaa B Ebaid ... Magda M Madbouly
-
Doaa B Ebaid, et. al.Doaa B Ebaid ... Magda M Madbouly
26 Nov 2022
26 Nov 2022

Global-local fusion based on adversarial sample generation for image-text matching
Shichen Huang ... Shuai Liu
Information Fusion | VOL. 103
Shichen Huang, et. al.Shichen Huang ... Shuai Liu
20 Oct 2023
Information Fusion | VOL. 103

End-to-end training image-text matching network
Depeng Wang ... Chen Hong
-
Depeng Wang, et. al.Depeng Wang ... Chen Hong
01 Jul 2022
01 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

Abstract

Talk to us

Similar Papers