Evaluating Syntactic Annotation of Ancient Languages

Erica Biagetti,Salvatore Scarlata,Oliver Hellwig,Elia Ackermann,Paul Widmer

doi:10.1163/26670755-01010003

Erica Biagetti, Salvatore Scarlata + Show 3 more

Open Access

https://doi.org/10.1163/26670755-01010003

Copy DOI

Abstract

Abstract In this paper we introduce an extended version of the Vedic Treebank (vtb, Hellwig et al. 2020) which comes along with revisited and extended annotation guidelines. In order to assess the quality of our annotations as well as the usability and limits of the guidelines we performed an inter-annotator agreement test. The results show that agreement between annotators is hampered by various factors, most prominently by insufficient understanding of the content because of the cultural and temporal gap and incomplete knowledge of Vedic grammar. An in-depth discussion of disagreeing annotations demonstrates that the setup of the workflow, too, has a major influence on inter-annotator agreement. We suggest some measures that can help increase the transparency and annotation consistency according to current knowledge of the language when annotating Vedic Sanskrit, or ancient language varieties in general.

Highlights

Treebanks have become indispensable tools for studying syntactic and morphological phenomena and for enhancing Natural Language Processing.While earlier endeavors in annotating syntactic structure were largely confined to modern languages, an increasing number of treebanks of ancient languages has been published in recent years
Our paper follows in the wake of other contributions concerned with the process of building linguistic resources for ancient languages, such as the proiel treebanks1 of early Indo-European languages (Eckhoff et al 2018a,b), the ittb2 (Passarotti 2019), the Ancient Greek and Latin Dependency Treebank3 or, outside of the Indo-European domain, the treebank of Old Chinese4 (Yasuoka 2019), and with the potential that annotated corpora have for the study of ancient languages (Eckhoff et al 2018b)
As sentence segmentation turned out to be a source of considerable disagreement, we report a third setting ‘cleaned-sameSeg’ for the evaluation of the actual syntactic annotation 5.2

Summary

Introduction

Treebanks have become indispensable tools for studying syntactic and morphological phenomena and for enhancing Natural Language Processing (nlp).While earlier endeavors in annotating syntactic structure were largely confined to modern languages (e.g. the Penn treebank), an increasing number of treebanks of ancient languages has been published in recent years. The indigenous tradition partitions the texts with vertical strokes (|, daṇḍa) only at higher levels of compositional complexity (books, chapters, paragraphs in prose; stanzas and hemistiches in metrical texts), but does not feature a punctuation system that structures utterances, clauses, sentences and their constituents For these reasons, sentence-segmentation must be performed manually as part of the annotation process. Such clitics can depend on any noun in the clause or on the verb, and alternative interpretations of the text lead to alternative dependencies, all of which are acceptable from the point of view of Vedic grammar. In example (16), which is taken from a Ṛgvedic hymn addressing the god Indra, the adjective priyám ‘dear’ can be interpreted as an attribute of mánma ‘thought’ (label amod) or, alternatively, as a depictive secondary predicate (label acl:dpct; Schultze-Berndt & Himmelmann 2004; Himmelmann & Schultze-Berndt 2005; Casaretto 2020) meaning that Indra, the addressee of the hymn, does not generally rejoice at every thought, but only when the thoughts are dear to him

Summary and Outlook

Findings

Nominal

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Old World: Journal of Ancient Africa and Eurasia	Publication Date: Sep 2, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Evaluating Syntactic Annotation of Ancient Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Old World: Journal of Ancient Africa and Eurasia

Lead the way for us

Similar Papers

Interobserver variability and positive predictive value for ultrasonographic BI-RADS categories requiring pathohistological evaluation.
V Pazin ... S Rakic
European Journal of Gynaecological Oncology | VOL. 37
V Pazin, et. al.V Pazin ... S Rakic
15 Feb 2016
European Journal of Gynaecological Oncology | VOL. 37

Learning part-of-speech taggers with inter-annotator agreement loss
Barbara Plank ... Dirk Hovy
-
Barbara Plank, et. al.Barbara Plank ... Dirk Hovy
01 Jan 2014
01 Jan 2014

UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis
Kiet Van Nguyen ... Vu Duc Nguyen
-
Kiet Van Nguyen, et. al.Kiet Van Nguyen ... Vu Duc Nguyen
01 Nov 2018
01 Nov 2018

Fully automated landmarking and facial segmentation on 3D photographs.
Bo Berends ... Guido De Jong
Scientific Reports | VOL. 14
Bo Berends, et. al.Bo Berends ... Guido De Jong
18 Mar 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating Syntactic Annotation of Ancient Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Old World: Journal of Ancient Africa and Eurasia