Abstract

This paper proposes a system of part of speech tagging for the South Indian language Kannada using supervised machine learning. POS tagging is an important step in Natural Language Processing and has varied applications such as word sense disambiguation, natural language understanding etc. Based on extensive research into methods used for POS tagging, Conditional Random fields have been chosen as our algorithm. CRFs are used for sequence modeling in POS tagging, named entity recognition and as an alternative to Hidden Markov Models. Three very large corpora are used and their results are compared. The feature sets for all three corpora are also varied. The best method for the task is determined using these results.

Highlights

  • Part-of-speech tagging is a fundamental task in Natural Language Processing and Computational Linguistics

  • This paper proposes a system of part of speech tagging for the South Indian language Kannada using supervised machine learning

  • Parts of Speech (POS) tagging is an important step in Natural Language Processing and has varied applications such as word sense disambiguation, natural language understanding etc

Read more

Summary

Introduction

Part-of-speech tagging is a fundamental task in Natural Language Processing and Computational Linguistics. Part of speech tags are frequently used as an important feature for other natural language processing tasks such as word-sense disambiguation, named entity recognition, information retrieval, and machine translation. The Sanskrit grammarian Yaska defined only four categories in his 5th century BC work, Nirukta. These are nama which includes nouns and adjectives, akhyata or verb, upasarga, which is a pre-verb or prefix, and nipata or particle. The Brown Corpus, one of the first English language corpora created for processing by a computer, use 87 tags. Part-of-speech tag will help in parsing, word-sense disambiguation algorithms and in shallow parsing to find names, times, dates or other named entities in the information extraction applications

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.