EUSKOR: End-to-end coreference resolution system for Basque.

Ander Soraluze,Xabier Arregi,Olatz Arregi,Arantza Díaz De Ilarraza,Natalia Grabar

doi:10.1371/journal.pone.0221801

Ander Soraluze, Xabier Arregi + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0221801

Copy DOI

Journal: PloS one	Publication Date: Sep 12, 2019
License type: CC BY 4.0

Affiliation: University of the Basque Country, Polymat

Abstract

This paper describes the process of adapting the Stanford Coreference resolution module to the Basque language, taking into account the characteristics of the language. The module has been integrated in a linguistic analysis pipeline obtaining an end-to-end coreference resolution system for the Basque language. The adaptation process explained can benefit and facilitate other languages with similar characteristics in the implementation of their coreference resolution systems. During the experimentation phase, we have demonstrated that language-specific features have a noteworthy effect on coreference resolution, obtaining a gain in CoNLL score of 7.07 with respect to the baseline system. We have also analysed the effect that preprocessing has in coreference resolution, comparing the results obtained with automatic mentions versus gold mentions. When gold mentions are provided, the results increase 11.5 points in CoNLL score in comparison with results obtained when automatic mentions are used. The contribution of each sieve is analysed concluding that morphology is essential for agglutinative languages to obtain good performance in coreference resolution. Finally, an error analysis of the coreference resolution system is presented which have revealed our system’s weak points and help to determine the improvements of the system. As a result of the error analysis, we have enriched the Basque coreference resolution adding new two sieves, obtaining an improvement of 0.24 points in CoNLL F1 when automatic mentions are used and of 0.39 points when the gold mentions are provided.

Highlights

Coreference resolution consists of identifying textual expressions that refer to realworld objects and determining which of these mentions refer to the same entity
Much attention has been paid to the problem of coreference resolution and many evaluation campaigns focusing on the topic have been undertaken in the last decades, from MUC-6 [3] in 1995 to the CoNLL shared task in 2012 [4]
EUSKOR outperforms the baseline system according to F1 on all the metrics

Summary

Introduction

Coreference resolution consists of identifying textual expressions (mentions) that refer to realworld objects (entities) and determining which of these mentions refer to the same entity. It is complex to create completely language-independent systems, whereas taking into account the characteristics of a language benefits performance of these tasks. In this scenario, a possible solution is to use a state-of-the-art system with a flexible modular architecture and adapt it to resolve coreference in the new language to be treated. The process we carried out demonstrates that using a modular architecture facilitates the development of robust coreference resolution systems for any language with different characteristics to the language for which the system was originally created. We describe the most important characteristics of Basque and the challenges they present for coreference resolution.

Related work

Pronouns

Possessives: We consider two types of possessives

Verbal nouns

NPs as part of complex postpositions

NPs containing subordinate clauses

Ellipsis

Coordination

Results

Conclusions and future work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EUSKOR: End-to-end coreference resolution system for Basque.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles
K Bretonnel Cohen ... Miji Joo-Young Choi
BMC bioinformatics | VOL. 18
K Bretonnel Cohen, et. al.K Bretonnel Cohen ... Miji Joo-Young Choi
17 Aug 2017
BMC bioinformatics | VOL. 18

A scaffolding approach to coreference resolution integrating statistical and rule-based models
Heeyoung Lee ... Dan Jurafsky
Natural language engineering | VOL. 23
Heeyoung Lee, et. al.Heeyoung Lee ... Dan Jurafsky
21 Mar 2017
Natural language engineering | VOL. 23

Boosting automatic event extraction from the literature using domain adaptation and coreference resolution
Makoto Miwa ... Paul Thompson
Computer applications in the biosciences : CABIOS | VOL. 28
Makoto Miwa, et. al.Makoto Miwa ... Paul Thompson
25 Apr 2012
Computer applications in the biosciences : CABIOS | VOL. 28

Improve Coreference Resolution with Parameter Tunable Anaphoricity Identification and Global Optimization
Shuhan Qi ... Xuan Wang
-
Shuhan Qi, et. al.Shuhan Qi ... Xuan Wang
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EUSKOR: End-to-end coreference resolution system for Basque.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one