Abstract
Part of Speech (PoS) tagging is the task to assign the appropriate morphosyntactic category to each word according to the context. Several probabilistic methods have been adapted for PoS tagging such as Conditional Random Fields, Support Vector Machines, and Decision Trees. Based on these methods, language independent PoS taggers have been developed such as CRF++, Yamcha and TreeTagger. These POS taggers implement the process of assigning the correct PoS (noun, verb, adjective, adverb …) to each word of the sentence. PoS taggers are developed by modeling the morphosyntactic structure of natural language text. In this paper, we tried to improve the accuracy of existing Amazigh POS taggers using a voting algorithm. The three used Amazigh POS taggers are: (1) Conditional Random Fields (CRF) tagger (2) Support Vector Machines (SVM) tagger (3) TreeTagger (TT). These taggers are developed with an accuracy of 86.79 %, 84.64 % and 86.57 % respectively. An annotated corpus of 60,000 words is used to form all these taggers. An error analysis is done to find out the mistakes made by these taggers. Then, a voting algorithm is proposed to construct an Amazigh PoS tagger to achieve better results and we can reach an accuracy of 89.06 %. This accurate POS tagger could be used for a variety of NLP applications to offer the students and the researchers an opportunity to work with language data with variety of tools and techniques in terms of computational procedures and programs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.