A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources

Alba Bonet-Jover,Robiert Sepúlveda-Torres,Estela Saquete,Patricio Martínez-Barco

doi:10.1016/j.knosys.2023.110723

Alba Bonet-Jover, Robiert Sepúlveda-Torres + Show 2 more

Open Access

https://doi.org/10.1016/j.knosys.2023.110723

Copy DOI

Journal: Knowledge-Based Systems	Publication Date: Jun 16, 2023
Citations: 1	License type: cc-by

Affiliation: University of Alicante

Abstract

Early detection of disinformation is one of the most challenging big-scale problems facing present day society. This is why the application of technologies such as Artificial Intelligence and Natural Language Processing is necessary. The vast majority of Artificial Intelligence approaches require annotated data, and generating these resources is very expensive. This proposal aims to improve the efficiency of the annotation process with a two-level semi-automatic annotation methodology. The first level extracts relevant information through summarization techniques. The second applies a Human-in-the-Loop strategy whereby the labels are pre-annotated by the machine, corrected by the human and reused by the machine to retrain the automatic annotator. After evaluating the system, the average annotation time per news item is reduced by 50%. In addition, a set of experiments on the semi-automatically annotated dataset that is generated are performed so as to demonstrate the effectiveness of the proposal. Although the dataset is annotated in terms of unreliable content, it is applied to the veracity detection task with very promising results (0.95 accuracy in reliability detection and 0.78 in veracity detection).

Full Text