Mention detection is an important component of the Coreference Resolution (CR) system, where mentions such as name, nominal, and pronominals are identified. These mentions can be purely coreferential mentions or singleton mentions (non-coreferential mentions). Coreferential mentions are those mentions in a text that refer to the same entities in the real world. Whereas, singleton mentions are mentioned only once in the text and do not participate in the coreference as they are not mentioned again in the following text. Filtering of these singleton mentions can substantially improve the performance of a CR process. This paper proposes a singleton mention detection module based on a Fully Connected Network (FCN) and a Long Short-Term Memory for Hindi text and model identifying singleton mentions so that these mentions can be filtered out to reduce the search space for CR. A CR system can look for the previous reference of that mention in the text and if these mentions are removed from the list of mentions, then it reduces the searching time and also space time. This model utilizes a few hand-crafted features, context information, and embedding for words from word2vec and a multilingual Bidirectional Encoder Representations from Transformers (mBERT) language model. The coreference annotated Hindi dataset comprising 3.6K sentences, and 78K tokens are used for the task. The singleton mention detection model is analyzed extensively by experimenting with various lengths of context windows for each mention. The performance of the model is significant with two window sizes of context as compared to other various window sizes of contexts such as 2,3,4,5, etc., and all previous and all next words of each mention. The Precision, Recall, and F-measure of the LSTM-FCN model with mBERT (Word + Context + Syntactic) with two window sizes of context for identifying the singleton mentions are 63%, 71%, and 67% respectively.
Read full abstract