Asking questions on handwritten document collections

Minesh Mathew,Dimosthenis Karatzas,C V Jawahar,Lluis Gomez

doi:10.1007/s10032-021-00383-3

Abstract

This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Asking questions on handwritten document collections

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)

Lead the way for us

Journal: International Journal on Document Analysis and Recognition (IJDAR)	Publication Date: Aug 6, 2021
Citations: 10

Similar Papers

Dynamic Language Binding in Relational Visual Reasoning
Thao Minh Le ... Vuong Le
-
Thao Minh Le, et. al.Thao Minh Le ... Vuong Le
01 Jul 2020
01 Jul 2020

Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo ... Yibing Liu
-
Yangyang Guo, et. al.Yangyang Guo ... Yibing Liu
18 Jul 2019
18 Jul 2019

Overcoming the Limitations of Learning-Based VQA for Counting Questions with Zero-Shot Learning
A Lubna ... Saidalavi Kalady
International Journal on Artificial Intelligence Tools | VOL. 33
A Lubna, et. al.A Lubna ... Saidalavi Kalady
20 Aug 2024
International Journal on Artificial Intelligence Tools | VOL. 33

Multi-level Attention Networks for Visual Question Answering
Dongfei Yu ... Jianlong Fu
-
Dongfei Yu, et. al.Dongfei Yu ... Jianlong Fu
29 Jun 2017
29 Jun 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Asking questions on handwritten document collections

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)