Applying named entity recognition and co-reference resolution for segmenting English texts

Pavlina Fragkou

doi:10.1007/s13748-017-0127-3

Abstract

In this paper we examine the benefit of performing named entity recognition (NER) and co-reference resolution to an English corpus used for text segmentation. The aim here is to examine whether the combination of text segmentation and information extraction can be beneficial for the identification of the various topics that appear in a document. NER was performed manually in the English corpus and was compared with the output produced by publicly available annotation tools. Produced annotations were manually corrected and enriched to cover four types of named entities (person, location, date and group of names). Co-reference resolution, i.e., substitution of every reference of the same instance with the same named entity identifier was subsequently performed. The evaluation, using five text segmentation algorithms leads to the conclusion that, manual annotation appears to be a promising solution due to the obscure contribution of publicly available tools.

Full Text