Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering.

Qifeng Li,Xinyi Tang,Yi Jian

doi:10.3390/s22041575

Abstract

Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model.

Highlights

Visual question answering (VQA) is an intersecting field of computer vision and natural language processing, which has just been proposed in recent years
The image features extracted by convolution neural network (CNN) are fused with the encoded text features, and the fused features are fed to the artificial neural network
We propose a reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems

Summary

Introduction

Visual question answering (VQA) is an intersecting field of computer vision and natural language processing, which has just been proposed in recent years. The image features extracted by convolution neural network (CNN) are fused with the encoded text features, and the fused features are fed to the artificial neural network These methods turn the visual question answering into a multi-label classification task. We propose a reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. The attention model uses the word encodings of the current tree node, and fuses the attention map of the child node with the relationship between words from the knowledge base to extract local visual evidence for explicit reasoning. The attention map of each node is considered as a qualitative experimental result in the process of explicit visual reasoning, which further shows that our model is interpretable and has strong adaptability to different tasks.

Visual Question Answering

Knowledge Base

Neural Module Network

Approach

Overview

Attention Model

Reasoning Model

Knowledge-Based Fact Model

Answer Prediction

Datasets

Implementation Details

Comparison with Existing Methods

Visual Reasoning

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Journal: Sensors (Basel, Switzerland)	Publication Date: Feb 17, 2022
License type: CC BY 4.0

Similar Papers

Introducing External Knowledge to Answer Questions with Implicit Temporal Constraints over Knowledge Base
Wenqing Wu ... Qiang Lu
Future internet | VOL. 12
Wenqing Wu, et. al.Wenqing Wu ... Qiang Lu
05 Mar 2020
Future internet | VOL. 12

Crake: Causal-Enhanced Table-Filler for Question Answering over Large Scale Knowledge Base
...
-
, et. al. ...
27 Jun 2022
27 Jun 2022

A Global–Local Attentive Relation Detection Model for Knowledge-Based Question Answering
Chen Qiu ... Zhihua Cai
IEEE transactions on artificial intelligence | VOL. 2
Chen Qiu, et. al.Chen Qiu ... Zhihua Cai
01 Apr 2021
IEEE transactions on artificial intelligence | VOL. 2

DAM: Transformer-based relation detection for Question Answering over Knowledge Base
Yongrui Chen ... Huiying Li
Knowledge Based Systems | VOL. 201-202
Yongrui Chen, et. al.Yongrui Chen ... Huiying Li
01 Jun 2020
Knowledge Based Systems | VOL. 201-202

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)