ABSTRACT Document processing and related tasks such as information extraction represent a large portion of business workloads and therefore offer high potential for efficiency improvements and process automation. Although semistructured documents such as invoices and receipts have been extensively studied, tax notices represent a largely unexplored area due to their information richness and layout complexity. This article presents a study investigating the potential of deep learning-based information extraction for German real estate tax notices. We propose a hybrid approach based on the LayoutLM transformer model that uses textual, positional, and visual information as well as a rule-based information extraction. We show how such a system can be developed and integrated into the business workflow of organizations on the example of a German tax consulting firm. We also discuss the implications and key challenges that organizations face when planning to adopt such technologies. Data Availability: Custom data and code are not publicly available. Relevant public source code is cited in the text. JEL Classifications: M15; M40; M41.
Read full abstract