Multimodal Neural Graph Memory Networks for Visual Question Answering

Mahmoud Khademi

doi:10.18653/v1/2020.acl-main.643

Abstract

We introduce a new neural network architecture, Multimodal Neural Graph Memory Networks (MN-GMN), for visual question answering. Our novel approach uses graph structure with different region features as node attributes and applies a recently proposed powerful graph neural network model, Graph Network (GN), to reason about objects and their interactions in the scene context. The input module of the MN-GMN generates a set of visual features plus a set of region-grounded captions (RGCs) for the image. The RGCs capture object attributes and their relationships. Two GNs are constructed from the input module using visual features and RGCs. Each node of the GNs iteratively computes a question-guided contextualized representation of the visual/textual information assigned to it. To combine the information from both GNs, each node writes the updated representations to an external spatial memory. The final states of the memory cells are fed into an answer module to predict an answer. Experiments show that MN-GMN rivals the state-of-the-art models on the Visual7W, VQA-v2.0, and CLEVR datasets.

Highlights

Visual question answering (VQA) has been recently introduced as a grand challenge for AI
This paper proposes a new neural network architecture for VQA based on the recent Graph Network (GN) (Battaglia et al, 2018)
We introduce a new memory network architecture, based on graph neural networks, which can reason about complex arrangements of objects in a scene to answer visual questions

Summary

Introduction

Visual question answering (VQA) has been recently introduced as a grand challenge for AI. The pairwise interactions between various regions of an image and spatial context in both horizontal and vertical directions are important to answer questions about objects and their interactions in the scene context. Our new architecture (see Figure 2), Multimodal Neural Graph Memory Network (MN-GMN), uses a Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7177–7188 July 5 - 10, 2020. C 2020 Association for Computational Linguistics graph structure to represent pairwise interactions between visual/textual features (nodes) from different regions of an image. GNs provide a contextaware neural mechanism for computing a feature for each node that represents complex interactions with other nodes. This enables our MN-GMN to answer questions that need reasoning about complex arrangements of objects in a scene

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal Neural Graph Memory Networks for Visual Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 39	License type: cc-by

Similar Papers

Drug response prediction in patient-derived xenografts with data augmentation and multimodal deep learning.
Alexander Partin ... James H Doroshow
Journal of Clinical Oncology | VOL. 40
Alexander Partin, et. al.Alexander Partin ... James H Doroshow
01 Jun 2022
Journal of Clinical Oncology | VOL. 40

Multimodal Recurrent Neural Network (MRNN) Based Self Balancing System: Applied into Two-Wheeled Robot
Azhar Aulia Saputra ... Indra Adji Sulistijono
-
Azhar Aulia Saputra, et. al.Azhar Aulia Saputra ... Indra Adji Sulistijono
01 Jan 2015
01 Jan 2015

Classification of Electrocardiogram of Congenital Heart Disease Patients by Neural Network Algorithms
Yongjie Yuan ... Gustavo Ramirez
Scientific Programming | VOL. 2021
Yongjie Yuan, et. al.Yongjie Yuan ... Gustavo Ramirez
31 Aug 2021
Scientific Programming | VOL. 2021

Principal Component Research of the Teaching Model Based on Multimodal Neural Network Algorithm.
Guang Yang ... Gengxin Sun
Computational intelligence and neuroscience | VOL. 2022
Guang Yang, et. al.Guang Yang ... Gengxin Sun
29 Jun 2022
Computational intelligence and neuroscience | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Neural Graph Memory Networks for Visual Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers