Recognition of Named Entities and Categories in Text using Stacked Embeddings

Paras Narendranath,Javaid Nabi,Br Shambhavi,P Jayarekha,N Shreyas,Harika Jayanthi

doi:10.1109/iccca49541.2020.9250886

Abstract

Named entities enable the identification of key elements in text while sentence classification provides for a summary of the same. Sequential labeling and sentence classification tasks together enable deeper extraction of information from text. Embeddings trained over a corpus pertaining to a specific domain, tend to generate strong vector representations thereby providing for the creation of better classification models. We propose custom fastText embeddings trained on a large Indian English news corpus. These embeddings are stacked with state-of-the-art Pooled Flair embeddings to generate an f1-score of 79 on a custom FIRE English NER dataset and 93.05 f1-score on a subset of the OntoNotes 5.0 dataset. The embeddings were also used for sentence classification on 20 news categories, to generate the best multi-class accuracy of 88.1%. We also propose two Indian news datasets, one based on the FIRE NER dataset and a custom multi-class sentence classification dataset.

Full Text