Abstract
Natural Language Processing (NLP) is mainly concerned with the development of computational models and tools of aspects of human (natural) language processing. Part of Speech Tagging (POS) is well studied topic and also one of the most fundamental preprocessing steps for any language in NLP. Natural language processing of Nepali is still lack significant research efforts in the area of NLP in India. POS tagging of Nepali is a necessary component for most NLP applications in Nepali, which analyses the construction of the language, behavior of the language and can be used to develop automated tools for language processing. From the literature survey and related works, it has been found that, not much work has been done previously on POS tagging for Nepali language in India due to lack of comprehensive set of tagged corpus or correct hand written rules. In this paper, Hidden Markov Model (HMM) based Part of Speech (POS) tagging for Nepali language has been discussed. HMM is the most popular used statistical model for POS tagging that uses little amount of knowledge about the language, apart from contextual information of the language. The evaluation of the tagger has been done using the corpora, which are collected from TDIL (Technology Development for Indian Languages) and the BIS tagset of 42 tags. Tagset has been designed to meet the morph-syntactic requirements of the Nepali language. Apart from corpora and the tagset, python programming language and the NLTK's (Natural Language Toolkit) library has been used for implementation. The tagger achieves accuracy over 96% for known words but for unknown words, the research is still continuing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.