Feature-rich PoS Tagging through Taggers Combination : Experience in Arabic

Imad Zeroual,Abdelhak Lakhouaja

doi:10.14738/tmlai.54.2981

Abstract

Since words can play different syntactic roles in different contexts, it is not trivial to assign the appropriate morphosyntactic category to each word according to the context. Part of Speech (PoS) tagging is the task which manage this issue. Several probabilistic methods have been adapted for PoS tagging such as Hidden Markov Models, Support Vector Machines, and Decision Tree. Based on these methods, language-independent PoS taggers have been developed such as TnT, SVMTool, and Treetagger. The main purpose of this work is to combine automatically the output of these standard PoS taggers and investigate several options for how to do this combination. The experiments are applied to one of the morphologically complex languages, Arabic. In this paper, we highlight the use of these taggers via various experiments. In fact, the evaluations involve several tests on both Classical and Modern Standard Arabic, trained/untrained and tagged/untagged data. Finally, a deeper investigation of Arabic PoS tagging through these language-independent taggers combination is performed.

Full Text