Abstract

Named-Entity-Recognition (NER) is widely used as a foundation for Natural Language Processing (NLP) applications. There have been few previous attempts on building generic NER systems for Tamil language. These attempts were based on machine-learning approaches such as Hidden Markov Models (HMM), Maximum Entropy Markov Models (MEMM), Support Vector Machine (SVM) and Conditional Random Fields (CRF). Among them, CRF has been proven to be the best with respect to the accuracy of NER in Tamil. This paper presents a novel approach to build a Tamil NER system using the Margin-Infused Relaxed Algorithm (MIRA). We also present a comparison of performance between MIRA and CRF algorithms for Tamil NER. When the gazetteer, POS tags and orthographic features are used with the MIRA algorithm, it attains an F1-measure of 81.38% on the Tamil BBC news data whereas the CRF algorithm shows only an F1-measure of 79.13% for the same set of features. Our NER system outperforms all the previous NER systems for Tamil language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call