Multi-scale relation reasoning for multi-modal Visual Question Answering

Yirui Wu,Yuntao Ma,Shaohua Wan

doi:10.1016/j.image.2021.116319

Abstract

The goal of Visual Question Answering (VQA) is to answer questions about images. For the same picture, there are often completely different types of questions. Therefore, the main difficulty of VQA task lies in how to properly reason relationships among multiple visual objects according to types of input questions. To solve this difficulty, this paper proposes a deep neural network to perform multi-modal relation reasoning in multi-scales, which successfully constructs a regional attention scheme to focus on informative and question-related regions for better answering. Specifically, we firstly design regional attention scheme to select regions of interest based on informative evaluation computed by a question-guided soft attention module. Afterwards, features computed by regional attention scheme are fused in scaled combinations, thus generating more distinctive features with scalable information. Due to designs of regional attention and multi-scale property, the proposed method is capable to describe scaled relationships from multi-modal inputs to offer accurate question-guided answers. By conducting experiments on VQA v1 and VQA v2 datasets, we show that the proposed method has superior efficiencies than most of the existing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-scale relation reasoning for multi-modal Visual Question Answering

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication

Lead the way for us

Journal: Signal Processing: Image Communication	Publication Date: May 14, 2021
Citations: 31

Similar Papers

Effects of Teacher Question Types on Developing L2 Learners’ English Ability and Creativity
Jee Hyun Ma
Studies in English Education | VOL. 22
Jee Hyun MaJee Hyun Ma
30 Jun 2017
Studies in English Education | VOL. 22

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal ... Dhruv Batra
-
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Dhruv Batra
01 Jun 2018
01 Jun 2018

Relation-Aware Image Captioning for Explainable Visual Question Answering
Ching-Shan Tseng ... Ying-Jia Lin
-
Ching-Shan Tseng, et. al.Ching-Shan Tseng ... Ying-Jia Lin
01 Dec 2022
01 Dec 2022

A Novel Approach on Visual Question Answering by Parameter Prediction using Faster Region Based Convolutional Neural Network
Sudan Jha ... Vijender Kumar-Solanki
International Journal of Interactive Multimedia and Artificial Intelligence | VOL. 5
Sudan Jha, et. al.Sudan Jha ... Vijender Kumar-Solanki
01 Jan 2019
International Journal of Interactive Multimedia and Artificial Intelligence | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-scale relation reasoning for multi-modal Visual Question Answering

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication