Abstract

Part of Speech (POS) tagging for Indian languages like Hindi and Marathi is generally not an investigated territory. Some of the best taggers accessible for Indian dialects utilize crossbreeds of machine learning or stochastic techniques and phonetic information. Available corpuses for Hindi and Marathi are limited. Hence, when Natural Language Processing (NLP) is applied to Hindi and Marathi sentences, desired results are not achieved. Current POS tagging techniques give UNKNOWN (UNK) POS tag for words which are not present in the corpus. This paper proposes how Hidden Markov Model (HMM)-based approach for POS tagging can be extended using Naive Bayes theorem for prediction of UNK POS tag.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call