Beyond OCR + VQA: Towards end-to-end reading and reasoning for robust and accurate textvqa

Gangyan Zeng,Yuan Zhang,Yu Zhou,Xiaomeng Yang,Ning Jiang,Guoqing Zhao,Weiping Wang,Xu-Cheng Yin

doi:10.1016/j.patcog.2023.109337

Abstract

Text-based visual question answering (TextVQA), which answers a visual question by considering both visual contents and scene texts, has attracted increasing attention recently. Most existing methods employ an optical character recognition (OCR) module as a pre-processor to read texts, then combine it with a visual question answering (VQA) framework. However, inaccurate OCR results may lead to cumulative error propagation, and the correlation between text reading and text-based reasoning is not fully exploited. In this work, we integrate OCR into the flow of TextVQA, targeting the mutual reinforcement of OCR and VQA tasks. Specifically, a visually enhanced text embedding module is proposed to predict semantic features from the visual information of texts, by which texts can be reasonably understood even without accurate recognition. Further, two elaborate schemes are developed to leverage contextual information in VQA to modify OCR results. The first scheme is a reading modification module that adaptively selects the answer results according to the contexts. Second, we propose an efficient end-to-end text reading and reasoning network, where the downstream VQA signal contributes to the optimization of text reading. Extensive experiments show that our method outperforms existing alternatives in terms of accuracy and robustness, whether ground truth OCR annotations are used or not.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Beyond OCR + VQA: Towards end-to-end reading and reasoning for robust and accurate textvqa

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Jan 21, 2023
Citations: 9

Similar Papers

OCR-VQA: Visual Question Answering by Reading Text in Images
Anand Mishra ... Ajeet Kumar Singh
-
Anand Mishra, et. al.Anand Mishra ... Ajeet Kumar Singh
01 Sep 2019
01 Sep 2019

Improving Automatic VQA Evaluation Using Large Language Models
Oscar Mañas ... Aishwarya Agrawal
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Oscar Mañas, et. al.Oscar Mañas ... Aishwarya Agrawal
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

MobiVQA
Qingqing Cao ... Prerna Khanna
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 6
Qingqing Cao, et. al.Qingqing Cao ... Prerna Khanna
04 Jul 2022
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 6

Neural Networks for Detecting Irrelevant Questions During Visual Question Answering
Mengdi Li ... Cornelius Weber
-
Mengdi Li, et. al.Mengdi Li ... Cornelius Weber
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Beyond OCR + VQA: Towards end-to-end reading and reasoning for robust and accurate textvqa

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition