This paper presents a comprehensive survey of over 100 research works on the topic of form understanding in the context of scanned documents. We delve into recent advancements and breakthroughs in the field, with particular focus on transformer-based models, which have been shown to improve performance in form understanding tasks by up to 25% in accuracy compared to traditional methods. Our research methodology involves an in-depth analysis of popular documents and trends over the last decade, including 15 state-of-the-art models and 10 benchmark datasets. By examining these works, we offer novel insights into the evolution of this domain. Specifically, we highlight how transformers have revolutionized form-understanding techniques by enhancing the ability to process noisy scanned documents with significant improvements in OCR accuracy. Furthermore, we present an overview of the most relevant datasets, such as FUNSD, CORD, and SROIE, which serve as benchmarks for evaluating the performance of the models. By comparing the capabilities of these models and reporting an average improvement of 10–15% in key form extraction tasks, we aim to provide researchers and practitioners with useful guidance in selecting the most suitable solutions for their form understanding applications.
Read full abstract