Multi-view inter-modality representation with progressive fusion for image-text matching

Jie Wu,Leiquan Wang,Chenglizhao Chen,Jing Lu,Chunlei Wu

doi:10.1016/j.neucom.2023.02.043

Abstract

Recently, image-text matching has been intensively explored to bridge vision and language. Previous methods explore an inter-modality relationship between an image-text pair from the single-view feature. However, it is difficult to discover all the abundant information based on a single inter-modality relationship. In this paper, a novel Multi-View Inter-Modality Representation with Progressive Fusion (MIRPF) is developed to explore inter-modality relationships from multi-view features. The multi-view strategy provides more complementary and global semantic clues than single-view approaches. In particular, the multi-view inter-modality representation network is constructed to generate multiple inter-modality representations, which provide diverse views to discover the latent image-text relationships. Furthermore, the progressive fusion module is performed to fuse inter-modality features stepwise, which fully uses the inherent complementary between different views. Extensive experiments on Flickr30K and MSCOCO verify the superiority of MIRPF compared with several existing approaches. The code is available at: https://github.com/jasscia18/MIRPF.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-view inter-modality representation with progressive fusion for image-text matching

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Mar 1, 2023
Citations: 7

Similar Papers

Multi-Modal Memory Enhancement Attention Network for Image-Text Matching
Zhong Ji ... Zhigang Lin
IEEE Access | VOL. 8
Zhong Ji, et. al.Zhong Ji ... Zhigang Lin
01 Jan 2020
IEEE Access | VOL. 8

Multimodal Propaganda Detection Via Anti-Persuasion Prompt enhanced contrastive learning
Jian Cui ... Jingling Yuan
-
Jian Cui, et. al.Jian Cui ... Jingling Yuan
04 Jun 2023
04 Jun 2023

MCPR: A Chinese Product Review Dataset for Multimodal Aspect-Based Sentiment Analysis
Carol Xu ... Dan Wang
-
Carol Xu, et. al.Carol Xu ... Dan Wang
01 Jan 2021
01 Jan 2021

Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding
Lefei Zhang ... Bo Du
Pattern Recognition | VOL. 48
Lefei Zhang, et. al.Lefei Zhang ... Bo Du
02 Jan 2015
Pattern Recognition | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-view inter-modality representation with progressive fusion for image-text matching

Abstract

Talk to us

Similar Papers

More From: Neurocomputing