Analyzing Disagreements in Argumentation Annotation of Scientific Texts in Russian Language

I S Pimenov

doi:10.25205/1818-7935-2023-21-2-89-104

Abstract

This paper presents the analysis of inter-annotator disagreements in modeling argumentation in scientific papers. The aim of the study is to specify annotation guidelines for the typical disagreement cases. The analysis focuses on inter-annotator disagreements at three annotation levels: theses identification, links construction between theses, specification of reasoning models for these links. The dataset contains 20 argumentation annotations for 10 scientific papers from two thematic areas, where two experts have independently annotated each text. These 20 annotations include 917 theses and 773 arguments. The annotation of each text has consisted in modelling its argumentation structure in accordance with Argument Interchange Format. The use of this model results in construction of an oriented graph with two node types (information nodes for statements, scheme nodes for links between them and reasoning models in these links) for an annotated text. Identification of reasoning models follows Walton’s classification. To identify disagreements between annotators, we perform an automatic comparison of graphs that represent an argumentation structure of the same text. This comparison includes three stages: 1) identification of theses that are present in one graph and absent in another; 2) detection of links that connect the corresponding theses between graphs in a different manner; 3) identification of different reasoning models specified for the same links. Next, an expert analysis of the automatically identified discrepancies enables specification of the typical disagreement cases based on the structural properties of argumentation graphs (positioning of theses, configuration of links across statements at different distances in the text, the ratio between the overall frequency of a reasoning model in annotations and the frequency of disagreements over its identification). The study shows that the correspondence values between argumentation graphs reach on average 78 % for theses, 55 % for links, 60 % for reasoning models. Typical disagreement cases include 1) detection of theses expressed in a text without explicit justification; 2) construction of links between theses in the same paragraph or at a distance of four and more paragraphs; 3) identification of two specific reasoning models (connected respectively to the 40 % and 33 % of disagreements); 4) confusion over functionally different schemes due to the perception of links by annotators in different aspects. The study results in formulating annotation guidelines for minimizing typical disagreement cases at each level of argumentation structures.

Full Text