Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

Shengsheng Qian,Quan Fang,Dizhan Xue,Changsheng Xu

doi:10.1109/tpami.2022.3188547

Abstract

With the growing amount of multimodal data, cross-modal retrieval has attracted more and more attention and become a hot research topic. To date, most of the existing techniques mainly convert multimodal data into a common representation space where similarities in semantics between samples can be easily measured across multiple modalities. However, these approaches may suffer from the following limitations: 1) They overcome the modality gap by introducing loss in the common representation space, which may not be sufficient to eliminate the heterogeneity of various modalities; 2) They treat labels as independent entities and ignore label relationships, which is not conducive to establishing semantic connections across multimodal data; 3) They ignore the non-binary values of label similarity in multi-label scenarios, which may lead to inefficient alignment of representation similarity with label similarity. To tackle these problems, in this article, we propose two models to learn discriminative and modality-invariant representations for cross-modal retrieval. First, the dual generative adversarial networks are built to project multimodal data into a common representation space. Second, to model label relation dependencies and develop inter-dependent classifiers, we employ multi-hop graph neural networks (consisting of Probabilistic GNN and Iterative GNN), where the layer aggregation mechanism is suggested for using propagation information of various hops. Third, we propose a novel soft multi-label contrastive loss for cross-modal retrieval, with the soft positive sampling probability, which can align the representation similarity and the label similarity. Additionally, to adapt to incomplete-modal learning, which can have wider applications, we propose a modal reconstruction mechanism to generate missing features. Extensive experiments on three widely used benchmark datasets, i.e., NUS-WIDE, MIRFlickr, and MS-COCO, show the superiority of our proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 1, 2022
Citations: 16

Similar Papers

Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval
Shengsheng Qian ... Quan Fang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Shengsheng Qian, et. al.Shengsheng Qian ... Quan Fang
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval
Shengsheng Qian ... Dizhan Xue
IEEE Transactions on Multimedia | VOL. 24
Shengsheng Qian, et. al.Shengsheng Qian ... Dizhan Xue
01 Jan 2021
IEEE Transactions on Multimedia | VOL. 24

An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval
Lei Zhu ... Longzhi Sun
IEEE Access | VOL. 7
Lei Zhu, et. al.Lei Zhu ... Longzhi Sun
01 Jan 2019
IEEE Access | VOL. 7

Deep supervised multimodal semantic autoencoder for cross‐modal retrieval
Yu Tian ... Qingsong Liu
Computer Animation and Virtual Worlds | VOL. 31
Yu Tian, et. al.Yu Tian ... Qingsong Liu
01 Jul 2020
Computer Animation and Virtual Worlds | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence