ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering.

Yun Liu,Zhiyun Zhao,Zhoujun Li,Xiaoming Zhang,Lei Cheng,Bo Zhang

doi:10.1109/tcyb.2020.3029423

Abstract

Visual question answering (VQA) has gained increasing attention in both natural language processing and computer vision. The attention mechanism plays a crucial role in relating the question to meaningful image regions for answer inference. However, most existing VQA methods: 1) learn the attention distribution either from free-form regions or detection boxes in the image, which is intractable in answering questions about the foreground object and background form, respectively and 2) neglect the prior knowledge of human attention and learn the attention distribution with an unguided strategy. To fully exploit the advantages of attention, the learned attention distribution should focus more on the question-related image regions, such as human attention for both the questions, about the foreground object and background form. To achieve this, this article proposes a novel VQA model, called adversarial learning of supervised attentions (ALSAs). Specifically, two supervised attention modules: 1) free form-based and 2) detection-based, are designed to exploit the prior knowledge for attention distribution learning. To effectively learn the correlations between the question and image from different views, that is, free-form regions and detection boxes, an adversarial learning mechanism is implemented as an interplay between two supervised attention modules. The adversarial learning reinforces the two attention modules mutually to make the learned multiview features more effective for answer inference. The experiments performed on three commonly used VQA datasets confirm the favorable performance of ALSA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics

Lead the way for us

Journal: IEEE transactions on cybernetics	Publication Date: Nov 11, 2020
Citations: 21

Similar Papers

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo ... Yibing Liu
-
Yangyang Guo, et. al.Yangyang Guo ... Yibing Liu
18 Jul 2019
18 Jul 2019

Can Pre-training help VQA with Lexical Variations?
Shailza Jolly ... Shubham Kapoor
-
Shailza Jolly, et. al.Shailza Jolly ... Shubham Kapoor
01 Jan 2020
01 Jan 2020

Adversarial Learning to Improve Question Image Embedding in Medical Visual Question Answering
Kaveesha Silva ... Thanuja Maheepala
-
Kaveesha Silva, et. al.Kaveesha Silva ... Thanuja Maheepala
27 Jul 2022
27 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics