Annotating Multiparty Discourse: Challenges for Agreement Metrics

Nina Wacholder,Smaranda Muresan,Mark Aakhus,Debanjan Ghosh

doi:10.3115/v1/w14-4918

Abstract

To computationally model discourse phenomena such as argumentation we need corpora with reliable annotation of the phenomena under study. Annotating complex discourse phenomena poses two challenges: fuzziness of unit boundaries and the need for multiple annotators. We show that current metrics for inter-annotator agreement (IAA) such as P/R/F1 and Krippendorff’s provide inconsistent results for the same text. In addition, IAA metrics do not tell us what parts of a text are easier or harder for human judges to annotate and so do not provide sufficiently specific information for evaluating systems that automatically identify discourse units. We propose a hierarchical clustering approach that aggregates overlapping text segments of text identified by multiple annotators; the more annotators who identify a text segment, the easier we assume that the text segment is to annotate. The clusters make it possible to quantify the extent of agreement judges show about text segments; this information can be used to assess the output of systems that automatically identify discourse units.

Highlights

Annotation of discourse typically involves three subtasks: segmentation, segment classification and relation identification (Peldszus and Stede, 2013a)
The difficulty of achieving an Inter-Annotator Agreement (IAA) of .80, which is generally accepted as good agreement, is compounded in studies of discourse annotations since annotators must unitize, i.e. identify the boundaries of discourse units (Artstein and Poesio, 2008)
The need for annotators to identify the boundaries of text segments makes measurement of IAA more difficult because standard coefficients such as κ assume that the units to be coded have been identified before the coding begins (Artstein and Poesio, 2008)

Summary

Introduction

Annotation of discourse typically involves three subtasks: segmentation (identification of discourse units, including their boundaries), segment classification (labeling the role of discourse units) and relation identification (indicating the link between the discourse units) (Peldszus and Stede, 2013a). We show that methods for assessing IAA, such as the information retrieval inspired (P/R/F1) approach (Wiebe et al, 2005) and Krippendorff’s α (Krippendorff, 1995; Krippendorff, 2004b), which was developed for content analysis in the social sciences, provide inconsistent results when applied to segmentations involving fuzzy boundaries and multiple coders. These metrics do not tell us which parts of a text are easier or harder to annotate, or help choose a reliable gold standard. These clusters could serve as the basis for assessing the performance of systems that automatically identify ADUs - the system would be rewarded for identifying ADUs that are easier for people to recognize and penalized for identifying ADUs that are relatively hard for people to recognize

Annotation Study of Argumentative Discourse Units

Some Problems of Unitization Reliability with Existing IAA Metrics

Krippendorff’s α

Hierarchical Clustering of Discourse Units

Findings

Conclusion and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Annotating Multiparty Discourse: Challenges for Agreement Metrics

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2014
Citations: 29	License type: cc-by

Similar Papers

Evaluating Hierarchical Structure in Music Annotations.
Brian Mcfee ... Oriol Nieto
Frontiers in Psychology | VOL. 8
Brian Mcfee, et. al.Brian Mcfee ... Oriol Nieto
03 Aug 2017
Frontiers in Psychology | VOL. 8

Can Audio Captions Be Evaluated With Image Caption Metrics?
Zelin Zhou ... Zhiling Zhang
-
Zelin Zhou, et. al.Zelin Zhou ... Zhiling Zhang
23 May 2022
23 May 2022

Temporal Label Aggregation for Unintentional Action Localization
Nuoxing Zhou ... Jinglin Xu
-
Nuoxing Zhou, et. al.Nuoxing Zhou ... Jinglin Xu
05 Jul 2021
05 Jul 2021

English
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Annotating Multiparty Discourse: Challenges for Agreement Metrics

Abstract

Highlights

Summary

Talk to us

Similar Papers