AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

Marta Recasens,M Antònia Martí

doi:10.1007/s10579-009-9108-x

AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

Marta Recasens, M Antònia Martí

https://doi.org/10.1007/s10579-009-9108-x

Copy DOI

Journal: Computers and the humanities	Publication Date: Dec 1, 2009
Citations: 104

Affiliation: University of Barcelona

#Full Noun Phrases #Coreference Links + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85---89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learning-based algorithms for automatic coreference resolution, as well as to carry out bottom-up linguistic descriptions of coreference relations as they occur in real data.

Full Text