Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

Olga Uryupina,Ron Artstein,Kepa J Rodriguez,Massimo Poesio,Federica Cavicchio,Antonella Bristot,Francesca Delogu

doi:10.1017/s1351324919000056

Abstract

AbstractThis paper presents the second release ofarrau, a multigenre corpus of anaphoric information created over 10 years to provide data for the next generation of coreference/anaphora resolution systems combining different types of linguistic and world knowledge with advanced discourse modeling supporting rich linguistic annotations. The distinguishing features ofarrauinclude the following: treating all NPs as markables, including non-referring NPs, and annotating their (non-) referentiality status; distinguishing between several categories of non-referentiality and annotating non-anaphoric mentions; thorough annotation of markable boundaries (minimal/maximal spans, discontinuous markables); annotating a variety of mention attributes, ranging from morphosyntactic parameters to semantic category; annotating the genericity status of mentions; annotating a wide range of anaphoric relations, including bridging relations and discourse deixis; and, finally, annotating anaphoric ambiguity. The current version of the dataset contains 350K tokens and is publicly available from LDC. In this paper, we discuss in detail all the distinguishing features of the corpus, so far only partially presented in a number of conference and workshop papers, and we also discuss the development between the first release ofarrauin 2008 and this second one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Journal: Natural Language Engineering	Publication Date: May 7, 2019
Citations: 31

Similar Papers

The use of linguistic and world knowledge in language processing
Tessa Warren ... Michael Walsh Dickey
Language and Linguistics Compass | VOL. 15
Tessa Warren, et. al.Tessa Warren ... Michael Walsh Dickey
01 Apr 2021
Language and Linguistics Compass | VOL. 15

Using linguistic, world, and contextual knowledge in a plan recognition model of dialogue
Lynn Lambert ... Sandra Carberry
-
Lynn Lambert, et. al.Lynn Lambert ... Sandra Carberry
01 Jan 1992
01 Jan 1992

Anaphora Resolution with the ARRAU Corpus
Massimo Poesio ... Alexandra Uma
-
Massimo Poesio, et. al.Massimo Poesio ... Alexandra Uma
01 Jan 2018
01 Jan 2018

World knowledge affects prediction as quickly as selectional restrictions: evidence from the visual world paradigm
Evelyn Milburn ... Michael Walsh Dickey
Language, Cognition and Neuroscience | VOL. 31
Evelyn Milburn, et. al.Evelyn Milburn ... Michael Walsh Dickey
18 Dec 2015
Language, Cognition and Neuroscience | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering