SkipCor: skip-mention coreference resolution using linear-chain conditional random fields.

Slavko Žitnik,Marko Bajec,Lovro Šubelj,Neil R Smalheiser

doi:10.1371/journal.pone.0100101

Slavko Žitnik, Marko Bajec + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0100101

Copy DOI

Journal: PloS one	Publication Date: Jun 23, 2014
Citations: 46	License type: CC BY 4.0

Affiliation: University of Ljubljana

Abstract

Coreference resolution tries to identify all expressions (called mentions) in observed text that refer to the same entity. Beside entity extraction and relation extraction, it represents one of the three complementary tasks in Information Extraction. In this paper we describe a novel coreference resolution system SkipCor that reformulates the problem as a sequence labeling task. None of the existing supervised, unsupervised, pairwise or sequence-based models are similar to our approach, which only uses linear-chain conditional random fields and supports high scalability with fast model training and inference, and a straightforward parallelization. We evaluate the proposed system against the ACE 2004, CoNLL 2012 and SemEval 2010 benchmark datasets. SkipCor clearly outperforms two baseline systems that detect coreferentiality using the same features as SkipCor. The obtained results are at least comparable to the current state-of-the-art in coreference resolution.

Highlights

The field of Information Extraction (IE) deals with automatic extraction of structured information such as person names, locations, organizations etc. from unstructured or semi-structured text
In this paper we describe a novel coreference resolution system ‘SkipCor’, which is based on the well known conditional random fields algorithm [11]
We will provide an overview of the different coreference resolution systems, with special focus on approaches based on graphical models [11]

Summary

Introduction

The field of Information Extraction (IE) deals with automatic extraction of structured information such as person names, locations, organizations etc. from unstructured or semi-structured text. We are still not able to extract information with high precision and recall especially when performing IE on large unstructured datasets such as on the web for example. This and the fact that the amount of unstructured data is rapidly growing make the IE field more and more important. Bengston and Roth [17] have systematically divided different feature functions into categories and clearly demonstrated their importance They have shown that the development of well-designed features can greatly improve the performance of a coreference resolution system. Due to the similarities among the proposed supervised systems, the Reconcile platform [21] was developed in order to provide a common framework for new algorithms, features, and their evaluation

Objectives

Methods

Results

Conclusion