Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.

Yuxin Peng,Jinwei Qi,Yuxin Yuan

doi:10.1109/tip.2018.2852503

Abstract

Nowadays, cross-modal retrieval plays an important role to flexibly find useful information across different modalities of data. Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval. Different modalities such as image and text have imbalanced and complementary relationship, and they contain unequal amount of information when describing the same semantics. For example, images often contain more details that cannot be demonstrated by textual descriptions and vice versa. Existing works based on Deep Neural Network (DNN) mostly construct one common space for different modalities, to find the latent alignments between them, which lose their exclusive modality-specific characteristics. Therefore, we propose modality-specific cross-modal similarity measurement (MCSM) approach by constructing the independent semantic space for each modality, which adopts an endto- end framework to directly generate modality-specific crossmodal similarity without explicit common representation. For each semantic space, modality-specific characteristics within one modality are fully exploited by recurrent attention network, while the data of another modality is projected into this space with attention based joint embedding, which utilizes the learned attention weights for guiding the fine-grained cross-modal correlation learning, and captures the imbalanced and complementary relationship between different modalities. Finally, the complementarity between the semantic spaces for different modalities is explored by adaptive fusion of the modality-specific cross-modal similarities to perform cross-modal retrieval. Experiments on the widely-used Wikipedia, Pascal Sentence, MS-COCO datasets as well as our constructed large-scale XMediaNet dataset verify the effectiveness of our proposed approach, outperforming 9 stateof- the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing

Lead the way for us

Journal: IEEE Transactions on Image Processing	Publication Date: Jul 2, 2018
Citations: 161

Similar Papers

Fine-Grained Correlation Learning with Stacked Co-attention Networks for Cross-Modal Information Retrieval
Yuhang Lu ... Li Guo
-
Yuhang Lu, et. al.Yuhang Lu ... Li Guo
01 Jan 2018
01 Jan 2018

Modality Consistent Generative Adversarial Network for Cross-Modal Retrieval
Zhiyong Wu ... Fei Wu
-
Zhiyong Wu, et. al.Zhiyong Wu ... Fei Wu
01 Jan 2019
01 Jan 2019

Dementias Platform UK (DPUK) Data Portal - World-leading infrastructure facilitating innovative multi-modal research
Christopher Orton ... Laura North
International Journal of Population Data Science | VOL. 3
Christopher Orton, et. al.Christopher Orton ... Laura North
28 Aug 2018
International Journal of Population Data Science | VOL. 3

Cross-domain retrieving sketch and shape using cycle CNNs
Mingjia Chen ... Ligang Liu
Computers & Graphics | VOL. 89
Mingjia Chen, et. al.Mingjia Chen ... Ligang Liu
15 May 2020
Computers & Graphics | VOL. 89

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing