Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets

Sebastian Haunss,Gabriella Lapesa,Andre Blessing,Nico Blokker,Sebastian Padó,Jonas Kuhn,Erenay Dayanik

doi:10.17645/pag.v8i2.2591

Sebastian Haunss, Gabriella Lapesa + Show 5 more

Open Access

https://doi.org/10.17645/pag.v8i2.2591

Copy DOI

Abstract

This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.

Highlights

Discourse network analysis (DNA) offers a conceptual framework for the analysis of discourse structures and dynamics
A natural question to ask is how good the AI annotator is. We will answer this question in two steps: In Section 4, after having provided more details concerning the technical side of the AI pseudoannotator, we will discuss its performance from a Natural Language Processing (NLP) perspective; in Section 5, we will present the results of a computer-assisted annotation experiment in which the AI will be employed to suggest relevant claims to the annotators
While the integration of machine learning in annotation workflows has been suggested before, no working systems have yet been developed that leverage machine learning for corpus creation and text selection and for the actual annotation of texts using complex and multifaceted abstract categories

Summary

Introduction

Discourse network analysis (DNA) offers a conceptual framework for the analysis of discourse structures and dynamics. The second group of approaches tries to capture complex meaning structures on a more fine-grained level They usually rely on more or less extensive annotation of the raw text material by human annotators, following a codebook that provides categories at a certain level of abstraction from the original text in order to identify political claims (Koopmans & Statham, 2010), frames (D’Angelo & Kuypers, 2010), or evaluative statements (Schmidtke & Nullmeier, 2011). A natural question to ask is how good the AI annotator is We will answer this question in two steps: In Section 4, after having provided more details concerning the technical side of the AI pseudoannotator, we will discuss its performance from a Natural Language Processing (NLP) perspective; in Section 5, we will present the results of a computer-assisted annotation experiment in which the AI will be employed to suggest relevant claims to the annotators (and not just to the experts in the gold merging stage)

The AI-Pseudo-Annotator

Developing Claim Identification and Classification Methods

Evaluation of Classifier Quality

Annotation Experiment

Discourse Networks

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Politics and Governance	Publication Date: Jun 2, 2020
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Politics and Governance

Lead the way for us

Similar Papers

A Service Annotation Quality Improvement Approach Based on Efficient Human Intervention
Xuehao Sun ... Keman Huang
-
Xuehao Sun, et. al.Xuehao Sun ... Keman Huang
01 Jul 2018
01 Jul 2018

O-125 Application of artificial intelligence using big data to devise and train a machine learning model on over 63,000 human embryos to automate time-lapse embryo annotation
A Campbell ... A Khan
Human Reproduction | VOL. 37
A Campbell, et. al.A Campbell ... A Khan
29 Jun 2022
Human Reproduction | VOL. 37

Quantifying the impact of context on the quality of manual hate speech annotation
Nikola Ljubešić ... Petra Kralj Novak
Natural Language Engineering | VOL. 29
Nikola Ljubešić, et. al.Nikola Ljubešić ... Petra Kralj Novak
22 Aug 2022
Natural Language Engineering | VOL. 29

A Text Annotation Tool with Pre-annotation Based on Deep Learning
Fei Teng ... Ming Xiao
-
Fei Teng, et. al.Fei Teng ... Ming Xiao
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Politics and Governance