Spatial-Semantic Collaborative Graph Network for Textbook Question Answering

Yaxian Wang,Lingling Zhang,Jun Liu,Bifan Wei,Yaqiang Wu,Qika Lin

doi:10.1109/tcsvt.2022.3231463

Abstract

Textbook Question Answering (TQA) task requires answering questions by reasoning based on both the given diagrams and text context. There are mainly two challenges for the task. First, the diagrams are different from the natural images. Similar shapes or color blocks may express different semantics and there is also a large intra-topic variation for diagrams. Hence, the characteristics of visual semantic ambiguity and variable visual appearance make the diagram understanding more challenging. Second, for the text, the specific education domain with terminologies exists a great gap with the general domain. Therefore, it is difficult to represent the text semantics effectively using a text encoder pretrained in the general domain. In this paper, we propose a Spatial-Semantic Collaborative Graph Network (SSCGN) for TQA task, which can help enhance the diagram and text understanding and facilitate multimodal reasoning. Specifically, the Spatial-guided Semantic Enhancing (SSE) module fully exploits the spatial and semantic relationships between visual objects and OCR tokens to collaboratively enhance the diagram semantic understanding. Moreover, based on the semantically enhanced region representations of the SSE module, the Fine-grained Spatial-Aware Graph Network (FSA-GN) can help obtain richer relation-aware region representations for joint reasoning by capturing more fine-grained spatial relationships. We further propose multiple self-supervised auxiliary tasks to enhance the initial diagram and text semantic representations by pretraining the diagram encoder and text encoder. Extensive experiments and ablation studies are conducted to validate the effectiveness of SSCGN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spatial-Semantic Collaborative Graph Network for Textbook Question Answering

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Jul 1, 2023
Citations: 1

Similar Papers

A framework for classifying semantic relationships
Amaal Saleh Hassan Al Hashimy ... Narayanan Kulathuramaiyer
-
Amaal Saleh Hassan Al Hashimy, et. al.Amaal Saleh Hassan Al Hashimy ... Narayanan Kulathuramaiyer
01 Nov 2015
01 Nov 2015

Failure to Achieve Domain Invariance With Domain Generalization Algorithms: An Analysis in Medical Imaging
Steven Korevaar ... Ruwan Tennakoon
IEEE Access | VOL. 11
Steven Korevaar, et. al.Steven Korevaar ... Ruwan Tennakoon
01 Jan 2023
IEEE Access | VOL. 11

LRB-Net: Improving VQA via division of labor strategy and multimodal classifiers
Jiangfan Feng ... Ruiguo Liu
Displays | VOL. 75
Jiangfan Feng, et. al.Jiangfan Feng ... Ruiguo Liu
28 Oct 2022
Displays | VOL. 75

Composition of semantic relations
Eduardo Blanco ... Dan Moldovan
ACM Transactions on Speech and Language Processing | VOL. 10
Eduardo Blanco, et. al.Eduardo Blanco ... Dan Moldovan
01 Dec 2013
ACM Transactions on Speech and Language Processing | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatial-Semantic Collaborative Graph Network for Textbook Question Answering

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology