Abstract
Textbook question answering (TQA) task aims to infer answers for given questions from a multimodal context, including text and diagrams. The existing studies have aggregated intramodal semantics extracted from a single modality but have yet to capture the intermodal semantics between different modalities. A major challenge in learning intermodal semantics is maintaining lossless intramodal semantics while bridging the gap of semantics caused by heterogeneity. In this article, we propose an intermodal relation-aware heterogeneous graph network (IMR-HGN) to extract the intermodal semantics for TQA, which aggregates different modalities while learning features rather than representing them independently. First, we design a multidomain consistent representation (MDCR) to eliminate semantic gaps by capturing intermodal features while maintaining lossless intramodal semantics in multidomains. Furthermore, we present neighbor-based relation inpainting (NRI) to reduce semantic ambiguity via repairing fuzzy relations with correlations of relations. Finally, we propose hierarchical multisemantics aggregation (HMSA) to guarantee the completeness of semantics by aggregating features of nodes and relations with a reconstruction network (RN). Experimental results show that IMR-HGN could extract the intermodal semantics of answers, achieving a 2.16% improvement on the validation set of the TQA dataset and a 3.04% increase on the test set of the AI2D dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on neural networks and learning systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.