Named Entity Recognition in Bengali

Asif Ekbal,Sivaji Bandyopadhyay

doi:10.3384/nejlt.2000-1533.091226

Abstract

This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Named Entity Recognition in Bengali

Abstract

Talk to us

Similar Papers

More From: Northern European Journal of Language Technology

Lead the way for us

Journal: Northern European Journal of Language Technology	Publication Date: Feb 2, 2010
Citations: 9

Similar Papers

A Multiengine NER System with Context Pattern Learning and Post-processing Improves System Performance
Asif Ekbal ... Sivaji Bandyopadhyay
International Journal of Computer Processing of Languages | VOL. 22
Asif Ekbal, et. al.Asif Ekbal ... Sivaji Bandyopadhyay
01 Jun 2009
International Journal of Computer Processing of Languages | VOL. 22

Named Entity Recognition in Indian Languages Using Maximum Entropy Approach
Asif Ekbal ... Sivaji Bandyopadhyay
International Journal of Computer Processing of Languages | VOL. 21
Asif Ekbal, et. al.Asif Ekbal ... Sivaji Bandyopadhyay
01 Sep 2008
International Journal of Computer Processing of Languages | VOL. 21

Named Entity Recognition using Support Vector Machine: A Language Independent Approach
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
23 Mar 2010
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Named entity recognition in Bengali and Hindi using support vector machine
Asif Ekbal ... Sivaji Bandyopadhyay
Lingvisticæ Investigationes | VOL. 34
Asif Ekbal, et. al.Asif Ekbal ... Sivaji Bandyopadhyay
07 Jul 2011
Lingvisticæ Investigationes | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Named Entity Recognition in Bengali

Abstract

Talk to us

Similar Papers

More From: Northern European Journal of Language Technology