GRU based Named Entity Recognition System for Bangla Online Newspapers

Nayan Banik,Md Hasan Hafizur Rahman

doi:10.1109/ciet.2018.8660795

Abstract

Information Extraction (IE) from textual documents locates important entities and their underlying connections using automated systems which are crucial to different applications including Data Mining (DM), Question Answering (QA), Machine Translation (MT) and so on. Named Entity Recognition (NER) being a sub-component of Natural Language Processing (NLP) is an IE task which aims at locating the textual presence of entities belonging to a prescribed set of classes. Due to its political and geographical influence, Bangla language is widely spoken around the globe and it is important to enrich its linguistic knowledge through NLP tools where NER is a common pre- processing step. The expeditiously growing World Wide Web (WWW) containing Bangla textual documents is in a formative stage with the proliferation of Bangla online newspapers and researchers have applied traditional classic learning algorithms for Bangla NER task while few researchers have used hand- crafted rules. Technological improvements show that with the capability of Deep Learning technique, NER performance can be boosted and hence this work is an effort to apply a variation of Recurrent Neural Network (RNN); especially a Gated Recurrent Unit (GRU) model for developing a Bangla NER task with a manually annotated dataset. The evaluation of our experimental results discovers how our approach can perform better when applied on a large scale dataset.

Full Text