Abstract

Tables, especially when having complex layouts, contain rich semantic information. However, effectively learning from tables to uncover such semantic information remains challenging. The rapid progress in natural language processing does not necessarily correspond to equivalent advancements in table parsing, which often requires joint visual and language modeling. Indeed, humans can quickly derive semantic meaning from table entries by associating them with corresponding column and/or row headers. Motivated by this observation, we propose a new heterogeneous Graph-based Table Representation Learning (GTRL) framework. GTRL combines graph-based visual modeling with sequence-based language modeling to learn granular per-cell embeddings that are sensitive to the semantic meaning of cells within their corresponding table context. We systematically evaluate the proposed GTRL framework using two datasets: a new adhesive table benchmark comprising complex tables extracted from industrial documents for learning per-entry semantics, and a publicly available large-scale dataset that enables learning header semantics from column tables. Experimental results demonstrate the competitive performance of the proposed GTRL, which often exhibits reduced computational complexity compared to state-of-the-art table representation learning models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.