Abstract
This paper tackles automatic detection of contradictions in Spanish within the news domain. Two pieces of information are classified as compatible, contradictory, or unrelated information. To deal with the task, the ES-Contradiction dataset was created. This dataset contains a balanced number of each of the three types of information. The novelty of the research is the fine-grained annotation of the different types of contradictions in the dataset. Presently, four different types of contradictions are covered in the contradiction examples: negation, antonyms, numerical, and structural. However, future work will extend the dataset with all possible types of contradictions. In order to validate the effectiveness of the dataset, a pretrained model is used (BETO), and after performing different experiments, the system is able to detect contradiction with a F1m of 92.47%. Regarding the type of contradictions, the best results are obtained with negation contradiction (F1m = 98%), whereas structural contradictions obtain the lowest results (F1m = 69%) because of the smaller number of structural examples, due to the complexity of generating them. When dealing with a more generalistic dataset such as XNLI, our dataset fails to detect most of the contradictions properly, as the size of both datasets are very different and our dataset only covers four types of contradiction. However, using the classification of the contradictions leads us to conclude that there are highly complex contradictions that will need external knowledge in order to be properly detected and this will avoid the need for them to be previously exposed to the system.
Highlights
Accepted: 26 March 2021One of the worst problems in the current information society is disinformation
The model used is based on the BERT [22] model, and it performs a series of optimizations similar to those performed in the RoBERTa model [23]
In the ES-Contradiction dataset, contradictions are annotated with a fine-grained annotation that distinguishes the type of contradiction according to its specific characteristics
Summary
One of the worst problems in the current information society is disinformation. It is a wide-ranging problem that alludes to the inaccuracy and lack of veracity of certain information that seeks to deliberately deceive or misdirect [1]. This phenomenon spreads on a viral scale and can result in massive confusion about the real facts. In natural language processing (NLP), the task of contradiction identification implies detecting natural language statements conveying information about events or actions that cannot simultaneously hold [4]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.