Abstract

Part of Speech (PoS) tagging is the task to assign the appropriate morphosyntactic category to each word according to the context. Several probabilistic methods have been adapted for PoS tagging such as Conditional Random Fields, Support Vector Machines, and Decision Trees. Based on these methods, language independent PoS taggers have been developed such as CRF++, Yamcha and TreeTagger. These POS taggers implement the process of assigning the correct PoS (noun, verb, adjective, adverb …) to each word of the sentence. PoS taggers are developed by modeling the morphosyntactic structure of natural language text. In this paper, we tried to improve the accuracy of existing Amazigh POS taggers using a voting algorithm. The three used Amazigh POS taggers are: (1) Conditional Random Fields (CRF) tagger (2) Support Vector Machines (SVM) tagger (3) TreeTagger (TT). These taggers are developed with an accuracy of 86.79 %, 84.64 % and 86.57 % respectively. An annotated corpus of 60,000 words is used to form all these taggers. An error analysis is done to find out the mistakes made by these taggers. Then, a voting algorithm is proposed to construct an Amazigh PoS tagger to achieve better results and we can reach an accuracy of 89.06 %. This accurate POS tagger could be used for a variety of NLP applications to offer the students and the researchers an opportunity to work with language data with variety of tools and techniques in terms of computational procedures and programs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call