Answer Again: Improving VQA with Cascaded-Answering Model

Liang Peng,Yang Yang,Huimin Lu,Xiaopeng Zhang,Heng Tao Shen,Yanli Ji

doi:10.1109/tkde.2020.2998805

Abstract

Visual Question Answering (VQA) is a very challenging task, which requires to understand visual images and natural language questions simultaneously. In the open-ended VQA task, most previous solutions focus on understanding the question and image contents, as well as their correlations. However, they mostly reason the answers in a one-stage way, which results in that the generated answers are significantly ignored. In this paper, we propose a novel approach, termed Cascaded-Answering Model~(CAM), which extends the conventional one-stage VQA model to a two-stage model. Hence, the proposed model can fully explore the semantics embedded in the predicted answers. Specifically, our CAM is composed of two cascaded answering modules: Candidate Answer Generation~(CAG) module and Final Answer Prediction~(FAP) module. In CAG module, we select multiple relevant candidates from the generated answers using a typical VQA approach with Co-Attention. While in FAP module, we integrate the information of question and image, together with the semantics explored from the selected candidate answers to predict the final answer. Experimental results demonstrate that our proposed model produces high-quality candidate answers and achieves the state-of-the-art performance on three large benchmark datasets, VQA-1.0, VQA-2.0 and VQA-CP v2.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Answer Again: Improving VQA with Cascaded-Answering Model

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Jan 1, 2020
Citations: 17

Similar Papers

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

MobiVQA
Qingqing Cao ... Prerna Khanna
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 6
Qingqing Cao, et. al.Qingqing Cao ... Prerna Khanna
04 Jul 2022
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 6

Visual Question Answering as Reading Comprehension
Hui Li ... Anton Van Den Hengel
-
Hui Li, et. al.Hui Li ... Anton Van Den Hengel
01 Jun 2019
01 Jun 2019

Rankvqa: Answer Re-Ranking For Visual Question Answering
Yanyuan Qiao ... Zheng Yu
-
Yanyuan Qiao, et. al.Yanyuan Qiao ... Zheng Yu
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Answer Again: Improving VQA with Cascaded-Answering Model

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering