Abstract

The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.

Highlights

  • The release of the ONTONOTES coreference corpus (Pradhan et al, 2007a) and the organization of two CONLL shared tasks based on the dataset (Pradhan et al, 2012) have resulted in a substantial increase in coreference research, both in terms of quantity and in terms of quality

  • A simple form of discourse deixis, event anaphora, is annotated in ONTONOTES; bridging reference was not annotated, a subset of the corpus has been annotated with this information by Markert et al (2012)

  • Marasovicet al. (2017) developed an approach to abstract anaphora resolution based on bidirectional LSTMs to produce representations of the anaphor and the candidate sentence, and a mention ranking component adapted from the systems by Clark and Manning (2016) and Wiseman et al (2015)

Read more

Summary

Introduction

The release of the ONTONOTES coreference corpus (Pradhan et al, 2007a) and the organization of two CONLL shared tasks based on the dataset (Pradhan et al, 2012) have resulted in a substantial increase in coreference research, both in terms of quantity and in terms of quality. Anaphora resolution involves a number of phenomena besides ‘coreference’, such as bridging reference (Clark, 1975) and discourse deixis (Webber, 1991). A simple form of discourse deixis, event anaphora, is annotated in ONTONOTES; bridging reference was not annotated, a subset of the corpus has been annotated with this information by Markert et al (2012). In ARRAU, all NPs are considered markables, including expletives and singletons. Both discourse deixis and bridging reference have been annotated. There are a number of reasons for this, ranging from the fact that research in both bridging reference and discourse deixis is still limited, to the unusual markup format. Our hope is that making such datasets available may, on the one hand, facilitate the use of ARRAU; on the other, increase the community of researchers working on these aspects of anaphora resolution

Genres
Markables
Types of anaphoric relations marked
Two releases
Markup
Identity anaphora
Discourse Deixis
The Three Tasks of CRAC 2018
Markable Settings
Task 1
Task 2
Task 3
Markable extraction
Conclusions
A Appendix
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.