Advancements in Document QnA: A Comprehensive Survey

Likhith V

doi:10.22214/ijraset.2024.57926

Abstract

Abstract: The increasing demand for effective document information extraction methods has underscored the necessity of addressing challenges related to semi-structured tables and diverse content formats. This survey extensively explores the intricate task of extracting information from documents with a particular emphasis on the challenges associated with precise Key Information Extraction (KIE) and their broader implications for enhancing document understanding efficiency. The survey delves into recent breakthroughs in this domain, with a special focus on notable approaches such as BROS, BloombergGPT, and the innovative Document Understanding Transformer (DonUT). Additionally, it provides a comprehensive analysis of various studies in Key Information Extraction (KIE) and Visual Document Understanding (VDU), elucidating the strengths and weaknesses of these endeavors. It also provides justification for highlighting DonUT lies in its unique OCR-free VDU model architecture based on Transformers, incorporating a pre-training objective that utilizes cross-entropy loss. The survey not only addresses current challenges but also illuminates promising avenues for advancing document text extraction techniques.

Full Text