DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval

Liewu Cai,Lei Zhu,Hongyan Zhang,Xinghui Zhu

doi:10.3390/fi14020043

Abstract

Cross-modal retrieval aims to search samples of one modality via queries of other modalities, which is a hot issue in the community of multimedia. However, two main challenges, i.e., heterogeneity gap and semantic interaction across different modalities, have not been solved efficaciously. Reducing the heterogeneous gap can improve the cross-modal similarity measurement. Meanwhile, modeling cross-modal semantic interaction can capture the semantic correlations more accurately. To this end, this paper presents a novel end-to-end framework, called Dual Attention Generative Adversarial Network (DA-GAN). This technique is an adversarial semantic representation model with a dual attention mechanism, i.e., intra-modal attention and inter-modal attention. Intra-modal attention is used to focus on the important semantic feature within a modality, while inter-modal attention is to explore the semantic interaction between different modalities and then represent the high-level semantic correlation more precisely. A dual adversarial learning strategy is designed to generate modality-invariant representations, which can reduce the cross-modal heterogeneity efficiently. The experiments on three commonly used benchmarks show the better performance of DA-GAN than these competitors.

Highlights

Cross-modal retrieval [1,2] is a hot issue in the field of multimedia [3]
To implement the above idea, this paper proposes a new approach, named Dual Attention Generative Adversarial Network (DA-Generative adversarial network (GAN))
We propose a novel Dual Attention Generative Adversarial Network (DA-GAN) for cross-modal retrieval, which is an integration of the adversarial learning method with a dual attention mechanism

Summary

Introduction

Cross-modal retrieval [1,2] is a hot issue in the field of multimedia [3]. As shown in Figure 1, it is aiming to find objects of one modality by queries of another modality. Canonical Correlation Analysis (CCA) [11] is adopted by many researches [12,13,14,15] to learn correlation between cross-modal instances with the same category label These CCA-based methods are supported by classical statistical theory, they cannot represent the complex non-linear semantic correlation. A pe rson on a mountain bike makes A black and white photo of a is the water. A car sits by a window with a curtain for the camera on a boat in theand three people presen ting near a woman. A cat is on the sill of a window with a shirtless bald man and two youngA prese nting being intervieweAdman and woman posing on a field one holding a bottle.

Results

Cross-Modal Retrieval

Attention Models

Generative Adversarial Network

Problem Definition

Review of Generative Adversarial Netw

Methodology

Overview of DA-GAN

Visual Feature Learning

Textual Feature Learning

Semantic Grouping of Samples

Adversarial Learning with Dual Attention

Intra-Attention

Inter-Attention

Discriminative Model

Optimization

Implementation Details

Datasets

Competitors

Results on Wikipedia Dataset

Traditional Method

C M -G A N s

Results on Nus-Wide Dataset

Results on Pascal Sentences Dataset

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future internet	Publication Date: Jan 27, 2022
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future internet

Lead the way for us

Similar Papers

Dual Attention Mechanisms with Perceptual Loss Ensemble for Infrared and Visible Image Fusion
Yubo Wu ... Xin Fan
-
Yubo Wu, et. al.Yubo Wu ... Xin Fan
01 Sep 2020
01 Sep 2020

CNN-LSTM Base Station Traffic Prediction Based On Dual Attention Mechanism and Timing Application
Hairong Jia ... Zelong Ren
The computer journal | VOL. 67
Hairong Jia, et. al.Hairong Jia ... Zelong Ren
31 Jan 2024
The computer journal | VOL. 67

Extraction of Agricultural Fields via DASFNet with Dual Attention Mechanism and Multi-scale Feature Fusion in South Xinjiang, China
Rui Lu ... Nan Wang
Remote sensing | VOL. 14
Rui Lu, et. al.Rui Lu ... Nan Wang
07 May 2022
Remote sensing | VOL. 14

Remaining Useful Life Prediction for AC Contactor Based on MMPE and LSTM With Dual Attention Mechanism
Shuguang Sun ... Hui Gao
IEEE Transactions on Instrumentation and Measurement | VOL. 71
Shuguang Sun, et. al.Shuguang Sun ... Hui Gao
01 Jan 2021
IEEE Transactions on Instrumentation and Measurement | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future internet