Abstract

In this paper we report on an experimental syntactically and morphologically driven rule-based Arabic tagger. The tagger is developed using Arabic language grammatical rules and regulations. The tagger requires no pre-tagged text and is developed using a primitive set of lexicon items along with extensive grammatical and structural rules. It is tested and compared to Stanford tagger both in terms of accuracy and performance (speed). Obtained results are quite comparable to Stanford tagger performance with marginal difference favoring the developed tagger in accuracy with huge difference in terms of speed of execution. The newly developed tagger named MTE Tagger has been tested and evaluated. For the evaluation of its accuracy of tagging, a set of Arabic text was manually prepared and annotated. Compared to Stanford tagger, the MTE tagger performance was quite comparable. The developed tagger makes use of no pre-annotated datasets, except of some simple lexicon consisting of list of words representing closed word types like demonstrative nouns or pronouns or some particles. For the purpose of evaluation of the new tagger, it was run on multiple datasets and results were compared to those of Stanford tagger. In particular, both taggers (the MTE and the Stanford) were run on a set of 1226 sentences with close to 20,000 tokens that was human annotated and verified to serve as testbed. The results were very encouraging where in both test runs, the MTE tagger outperformed the Stanford tagger in terms of accuracy of 87.88% versus 86.67% for the Stanford tagger. In terms of speed of tagging and in comparison Stanford tagger, MTE Taggers’ performance was on average 1:50. More improved accuracy is possible in future work as the set of rules are further optimized, integrated and more of Arabic language properties such as end of word discretization are used.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.