Abstract

Coreference resolution tries to identify all expressions (called mentions) in observed text that refer to the same entity. Beside entity extraction and relation extraction, it represents one of the three complementary tasks in Information Extraction. In this paper we describe a novel coreference resolution system SkipCor that reformulates the problem as a sequence labeling task. None of the existing supervised, unsupervised, pairwise or sequence-based models are similar to our approach, which only uses linear-chain conditional random fields and supports high scalability with fast model training and inference, and a straightforward parallelization. We evaluate the proposed system against the ACE 2004, CoNLL 2012 and SemEval 2010 benchmark datasets. SkipCor clearly outperforms two baseline systems that detect coreferentiality using the same features as SkipCor. The obtained results are at least comparable to the current state-of-the-art in coreference resolution.

Highlights

  • The field of Information Extraction (IE) deals with automatic extraction of structured information such as person names, locations, organizations etc. from unstructured or semi-structured text

  • In this paper we describe a novel coreference resolution system ‘SkipCor’, which is based on the well known conditional random fields algorithm [11]

  • We will provide an overview of the different coreference resolution systems, with special focus on approaches based on graphical models [11]

Read more

Summary

Introduction

The field of Information Extraction (IE) deals with automatic extraction of structured information such as person names, locations, organizations etc. from unstructured or semi-structured text. We are still not able to extract information with high precision and recall especially when performing IE on large unstructured datasets such as on the web for example. This and the fact that the amount of unstructured data is rapidly growing make the IE field more and more important. Bengston and Roth [17] have systematically divided different feature functions into categories and clearly demonstrated their importance They have shown that the development of well-designed features can greatly improve the performance of a coreference resolution system. Due to the similarities among the proposed supervised systems, the Reconcile platform [21] was developed in order to provide a common framework for new algorithms, features, and their evaluation

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call