Use of named entity recognition and co-reference resolution tools for segmenting english texts

Pavlina Fragkou

doi:10.1145/2801948.2802004

Abstract

In this paper we examine the benefit of performing named entity recognition (NER) and co-reference resolution to an English corpus used for text segmentation. The aim here is to examine whether the combination of text segmentation and information extraction can be beneficial for the identification of the various topics that appear in a document. NER was performed in the English corpus in two ways i.e., a) by using already available NER and co-reference resolution tools, b) by manually annotating text to cover four types of named entities and substituting every reference of the same instance with the same named entity identifier. The benefit of performing manual annotation instead of using a combination of already existing tools was performed by using two well known text segmentation algorithms. The comparison leads to the conclusion that, the benefit highly depends on the segment's topic and length, the number of named entity instances appearing in it, as well as the model in which each NER and co-reference resolution tool was trained to.

Full Text