Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks

Feiran Huang,Alireza Jolfaei,Ali Kashif Bashir

doi:10.1109/tevc.2021.3066285

Abstract

Multimodal representation learning is beneficial for many multimedia-oriented applications, such as social image recognition and visual question answering. The different modalities of the same instance (e.g., a social image and its corresponding description) are usually correlational and complementary. Most existing approaches for multimodal representation learning are not effective to model the deep correlation between different modalities. Moreover, it is difficult for these approaches to deal with the noise within social images. In this article, we propose a deep learning-based approach named evolutionary adversarial attention networks (EAANs), which combines the attention mechanism with adversarial networks through evolutionary training, for robust multimodal representation learning. Specifically, a two-branch visual-textual attention model is proposed to correlate visual and textual content for joint representation. Then adversarial networks are employed to impose regularization upon the representation by matching its posterior distribution to the given priors. Finally, the attention model and adversarial networks are integrated into an evolutionary training framework for robust multimodal representation learning. Extensive experiments have been conducted on four real-world datasets, including PASCAL, MIR, CLEF, and NUS-WIDE. Substantial performance improvements on the tasks of image classification and tag recommendation demonstrate the superiority of the proposed approach.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE transactions on evolutionary computation : a publication of the IEEE Neural Networks Council	Publication Date: Oct 1, 2021
Citations: 12	License type: other-oa

R Discovery Prime

R Discovery Prime

Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on evolutionary computation : a publication of the IEEE Neural Networks Council

Lead the way for us

Similar Papers

Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection
Aixuan Li ... Yuxin Mao
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 34
Aixuan Li, et. al.Aixuan Li ... Yuxin Mao
01 Jan 2024
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 34

Multimodal Machine Learning: Integrating Language, Vision and Speech
Louis-Philippe Morency ... Tadas Baltrušaitis
-
Louis-Philippe Morency, et. al.Louis-Philippe Morency ... Tadas Baltrušaitis
01 Jan 2017
01 Jan 2017

Multimodal Learning of Social Image Representation by Exploiting Social Relations.
Feiran Huang ... Xiaoming Zhang
IEEE transactions on cybernetics | VOL. 51
Feiran Huang, et. al.Feiran Huang ... Xiaoming Zhang
17 Feb 2021
IEEE transactions on cybernetics | VOL. 51

Learning Multimodal Representations by Symmetrically Transferring Local Structures
Bin Dong ... Songlei Jian
-
Bin Dong, et. al.Bin Dong ... Songlei Jian
13 Sep 2020
13 Sep 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on evolutionary computation : a publication of the IEEE Neural Networks Council