Abstract
This paper proposes a system of part of speech tagging for the South Indian language Kannada using supervised machine learning. POS tagging is an important step in Natural Language Processing and has varied applications such as word sense disambiguation, natural language understanding etc. Based on extensive research into methods used for POS tagging, Conditional Random fields have been chosen as our algorithm. CRFs are used for sequence modeling in POS tagging, named entity recognition and as an alternative to Hidden Markov Models. Three very large corpora are used and their results are compared. The feature sets for all three corpora are also varied. The best method for the task is determined using these results.
Highlights
Part-of-speech tagging is a fundamental task in Natural Language Processing and Computational Linguistics
This paper proposes a system of part of speech tagging for the South Indian language Kannada using supervised machine learning
Parts of Speech (POS) tagging is an important step in Natural Language Processing and has varied applications such as word sense disambiguation, natural language understanding etc
Summary
Part-of-speech tagging is a fundamental task in Natural Language Processing and Computational Linguistics. Part of speech tags are frequently used as an important feature for other natural language processing tasks such as word-sense disambiguation, named entity recognition, information retrieval, and machine translation. The Sanskrit grammarian Yaska defined only four categories in his 5th century BC work, Nirukta. These are nama which includes nouns and adjectives, akhyata or verb, upasarga, which is a pre-verb or prefix, and nipata or particle. The Brown Corpus, one of the first English language corpora created for processing by a computer, use 87 tags. Part-of-speech tag will help in parsing, word-sense disambiguation algorithms and in shallow parsing to find names, times, dates or other named entities in the information extraction applications
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Engineering & Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.