Adversarial Learning With Multi-Modal Attention for Visual Question Answering.

Yun Liu,Zhoujun Li,Lei Cheng,Xiaoming Zhang,Feiran Huang

doi:10.1109/tnnls.2020.3016083

Abstract

Visual question answering (VQA) has been proposed as a challenging task and attracted extensive research attention. It aims to learn a joint representation of the question-image pair for answer inference. Most of the existing methods focus on exploring the multi-modal correlation between the question and image to learn the joint representation. However, the answer-related information is not fully captured by these methods, which results that the learned representation is ineffective to reflect the answer of the question. To tackle this problem, we propose a novel model, i.e., adversarial learning with multi-modal attention (ALMA), for VQA. An adversarial learning-based framework is proposed to learn the joint representation to effectively reflect the answer-related information. Specifically, multi-modal attention with the Siamese similarity learning method is designed to build two embedding generators, i.e., question-image embedding and question-answer embedding. Then, adversarial learning is conducted as an interplay between the two embedding generators and an embedding discriminator. The generators have the purpose of generating two modality-invariant representations for the question-image and question-answer pairs, whereas the embedding discriminator aims to discriminate the two representations. Both the multi-modal attention module and the adversarial networks are integrated into an end-to-end unified framework to infer the answer. Experiments performed on three benchmark data sets confirm the favorable performance of ALMA compared with state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adversarial Learning With Multi-Modal Attention for Visual Question Answering.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Aug 24, 2020
Citations: 16

Similar Papers

Adversarial Learning of Answer-Related Representation for Visual Question Answering
Yun Liu ... Zhoujun Li
-
Yun Liu, et. al.Yun Liu ... Zhoujun Li
17 Oct 2018
17 Oct 2018

Adversarial Learning to Improve Question Image Embedding in Medical Visual Question Answering
Kaveesha Silva ... Thanuja Maheepala
-
Kaveesha Silva, et. al.Kaveesha Silva ... Thanuja Maheepala
27 Jul 2022
27 Jul 2022

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray ... Giedrius Burachas
-
Arijit Ray, et. al.Arijit Ray ... Giedrius Burachas
01 Jan 2019
01 Jan 2019

ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering.
Yun Liu ... Bo Zhang
IEEE transactions on cybernetics | VOL. 52
Yun Liu, et. al.Yun Liu ... Bo Zhang
11 Nov 2020
IEEE transactions on cybernetics | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adversarial Learning With Multi-Modal Attention for Visual Question Answering.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems