Abstract

Named Entity Recognition (NER) plays an important role in various Natural Language Processing (NLP) applications to extract the key information from a huge amount of unstructured text data. NER is a task of identifying and classifying the named entities into predefined categories for a given text. Recently, language models are highly appreciable in several NLP tasks as these state-of-the-art models result better even in resource scarcity. In this paper, we perform NER task on the Hindi language by incorporating the recently released multilingual language model MuRIL which stands for Multilingual Representation for Indian Languages. MuRIL is specially trained for 16 Indian languages. We develop a Hindi NER system using MuRIL with a conditional random field (CRF) layer and fine-tune the model on the ICON 2013 Hindi NER dataset. Further, in the proposed approach, we compute the addition of the last 4 layers representations of the MuRIL model instead of just using the last layer's representation and fine-tune the whole model. Several variants of this model are presented by applying different computations on token representations provided by different layers of 12-layered MuRIL architecture. The proposed model achieves state-of-the-art results as 87.89% precision, 83.74% recall and 85.77% F1-score and outperforms all other existing Hindi NER systems developed on the ICON 2013 dataset. Additionally, we develop a similar Hindi NER system by replacing the MuRIL language model with another state-of-the-art language model, called multilingual Bidirectional Encoder Representations from Transformers (mBERT) to analyze the efficiency of both language models over the Hindi NER task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call