Abstract

The increasing availability of information on the web makes the task of named entity recognition (NER) more challenging. Named entity recognition is an important pre-processor tool that is concerned with the extraction of entities of our interest such as person, location, organization, gene, protein, number, measurement, etc. The success of earlier named entity recognition systems is highly dependent on rule-based techniques or traditional machine learning algorithms exploiting several linguistic and non-linguistic features. In this article, we propose a novel named entity recognition (NER) system that involves the use of deep learning strategies as well as an enhanced version of word embeddings. We develop a Bidirectional Gated Recurrent Unit (Bi-GRU) and Convolutional Neural Networks (CNN) based bilingual named entity recognition system which is built upon enhanced word embeddings (EWE). Enhanced word embeddings (EWE) are generated by concatenation of FastText word embeddings along with minimal feature embeddings, namely part of speech embeddings, word prefix embeddings, word suffix embeddings, and word length embeddings which improve the computational power of deep learning methods. We perform several experiments using corpora in two different languages. One is IJCNLP-08 NERSSEAL shared task corpora containing annotated dataset in Hindi language and the other is manually annotated dataset in Punjabi language. We also make several experiments on bilingual Hindi and Punjabi dataset. The results of the experiments performed in this work reveal that the Bidirectional GRU and CNN based model along with enhanced word embeddings (EWE) has excelled with Precision, Recall, and F-score value of 92.60%, 90.70%, 91.64% respectively for Hindi, 93.87%, 93.33%, 93.60% respectively for Punjabi and 93.78%, 92.66%, 93.22% respectively for bilingual Hindi and Punjabi named entity recognition. Enhanced word embeddings accelerate the performance of a Bi-GRU and CNN based named entity recognition system without using a large set of features and any sort of gazetteers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call