Retrieval of meaningful information out of voluminous data available on the internet is a big challenge these days. Named entity recognition system deals with this challenge efficiently and achieves promising results in accessing information of interest in several NLP applications. The present study proposes a deep learning-based named entity recognition system using hybrid embedding which is the combination of fasttext and bidirectional LSTM based character embedding. These embeddings capture the contextual, syntactic and semantic properties of the text which improves the cognitive power of deep learning methods. We have performed different experiments with important variants of recurrent neural network (RNN) namely long short-term memory network (LSTM) and bidirectional LSTM as well as gated recurrent unit (GRU) and bidirectional GRU on manually annotated Punjabi dataset and annotated Hindi dataset collected from international joint conference on natural language processing (IJCNLP-08) website. We have also explored different word embeddings and character embeddings for named entity recognition task. Out of all the experiments, the bidirectional GRU model using hybrid embedding has outperformed with precision, recall, and f-score values of 84%, 83%, and 83.50% respectively for Punjabi named entity recognition and 75%, 77%, 75.99% respectively for Hindi named entity recognition.
Read full abstract