Abstract
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Current approaches address this issue by either predicting the adjacency of detected cells or direct generation of structural sequences. Nonetheless, these approaches either count on additional heuristic rules for post-processing, or involve the generation of extremely long-range sequences that lead to computational intricacy. In this paper, We redefine TSR as a LOgical location REgression paradigm, which effectively captures inherent logical dependencies and constraints among table cells. Correspondingly, we propose LORE, a novel approach for TSR. LORE simultaneously predicts accurate geometric coordinates of table cells and the logical structures of the entire table. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other TSR paradigms. Moreover, to enhance the model’s spatial and logical representation capabilities, we propose two pre-training tasks, resulting in an upgraded version named LORE++. The incorporation of pre-training is proven to enjoy significant advantages, leading to a substantial enhancement in terms of accuracy, generalization, and few-shot capability compared to its predecessor. Experiments on standard benchmarks demonstrate the superiority of LORE++, which highlights the potential and promising prospect of the logical location regression paradigm for TSR.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.