Abstract

Named Entity Recognition (NER) is used to classify each word of a document into predefined named entity classes and is important for Natural Language Processing (NLP) tasks such as information retrieval, question answering system, machine translation etc. Mising is a Tibeto-Burman language spoken by over 500,000 Mising people who inhabit in Assam. Mising is a resource-constrained language. The corpus for the language was developed by the authors with 50K words as there is hardly any document in the web written in the language. The authors had used 12 tag sets and Support Vector Machine (SVM) classifier for NER in Mising language. As the Mising language uses the Roman script and the named entities are always starts with capital letter in Mising language unlike other Indian languages makes it easier to extract and classify the NER. The authors used 5-fold cross validation test for the SVM based NER system with average Precision, recall and F-Score of 85.14%, 90.58% and 87.77% respectively. As this was the first NER system for the language, the authors did not find other systems to compare with.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call