Abstract
Growing concern with online misinformation has encouraged NLP research on fact verification. Since writers often base their assertions on structured data, we focus here on verifying textual statements given evidence in tables. Starting from the Table Parsing (TAPAS) model developed for question answering (Herzig et al., 2020), we find that modeling table structure improves a language model pre-trained on unstructured text. Pre-training language models on English Wikipedia table data further improves performance. Pre-training on a question answering task with column-level cell rank information achieves the best performance. With improved pre-training and cell embeddings, this approach outperforms the state-of-the-art Numerically-aware Graph Neural Network table fact verification model (GNN-TabFact), increasing statement classification accuracy from 72.2% to 73.9% even without modeling numerical information. Incorporating numerical information with cell rankings and pre-training on a question-answering task increases accuracy to 76%. We further analyze accuracy on statements implicating single rows or multiple rows and columns of tables, on different numerical reasoning subtasks, and on generalizing to detecting errors in statements derived from the ToTTo table-to-text generation dataset.
Highlights
The rapid growth in the amount and sources of online textual content has raised concerns about misinformation and its potential harmful impacts on society when quickly spread to a massive audience
We propose to adapt the Table Parsing (TAPAS) model (Herzig et al, 2020), which has proven effective in question answering over tables, to model tables for fact verification
The TAPAS-Row-Col-Rank model pre-trained on the question answering task over tables achieves the best performance
Summary
The rapid growth in the amount and sources of online textual content has raised concerns about misinformation and its potential harmful impacts on society when quickly spread to a massive audience. Concerns about misinformation have stimulated extensive research on automatic fact verification, i.e., verifying whether a given textual statement is entailed or refuted by the given evidence. Chen et al (2019) introduced a new large-scale dataset, TabFact, for verifying statements based on structured evidence in tables. Traditional language models trained on unstructured text are not directly applicable to learn representations for structured text. Detecting misinformation with structured evidence involves linguistic inference and numerical reasoning such as addition, subtraction, sorting, and counting
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.