Dual Path Multi-Modal High-Order Features for Textual Content based Visual Question Answering

Yanan Li,Yuetan Lin,Donghui Wang,Honghui Zhao

doi:10.1109/icpr48806.2021.9412231

Abstract

As a typical cross-modal problem, visual question answering (VQA) has received increasing attention from the communities of computer vision and natural language processing. Reading and reasoning about texts and visual contents in the images is a burgeoning and important research topic in VQA, especially for the visually impaired assistance applications. Given an image, it aims to predict an answer to a provided natural language question closely related to its textual contents. In this paper, we propose a novel end-to-end textual content based VQA model, which grounds question answering both on the visual and textual information. After encoding the image, question and recognized text words, it uses multi-modal factorized high-order modules and the attention mechanism to fuse question-image and question-text features respectively. The complex correlations among different features can be captured efficiently. To ensure the model's extendibility, it embeds candidate answers and recognized texts in a semantic embedding space and adopts semantic embedding based classifier to perform answer prediction. Extensive experiments on the newly proposed benchmark TextVQA demonstrate that the proposed model can achieve promising results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dual Path Multi-Modal High-Order Features for Textual Content based Visual Question Answering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Dual Path Multi-modal High-Order Features for Textual Content based Visual Question Answering

-

29 Dec 2020
29 Dec 2020

VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing
Shivangi Modi ... Dhatri Pandya
-
Shivangi Modi, et. al.Shivangi Modi ... Dhatri Pandya
01 Mar 2019
01 Mar 2019

Visual Question Answering for Intelligent Interaction
Panpan Gao ... M Praveen Kumar Reddy
-
Panpan Gao, et. al.Panpan Gao ... M Praveen Kumar Reddy
06 Jul 2022
06 Jul 2022

Visual question answering: Datasets, algorithms, and future challenges
Kushal Kafle ... Christopher Kanan
Computer Vision and Image Understanding | VOL. 163
Kushal Kafle, et. al.Kushal Kafle ... Christopher Kanan
13 Jun 2017
Computer Vision and Image Understanding | VOL. 163

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dual Path Multi-Modal High-Order Features for Textual Content based Visual Question Answering

Abstract

Talk to us

Similar Papers