Abstract

Part of Speech (POS) tagging is one of the most common techniques used in natural language processing (NLP) applications and corpus linguistics. Various POS tagging tools have been developed for Arabic. These taggers differ in several aspects, such as in their modeling techniques, tag sets and training and testing data. In this paper we conduct a comparative study of five Arabic POS taggers, namely: Stanford Arabic, CAMeL Tools, Farasa, MADAMIRA and Arabic Linguistic Pipeline (ALP) which examine their performance using text samples from Saudi novels. The testing data has been extracted from different novels that represent different types of narrations. The main result we have obtained indicates that the ALP tagger performs better than others in this particular case, and that Adjective is the most frequent mistagged POS type as compared to Noun and Verb.

Highlights

  • Part of Speech (POS) tagging is the process of assigning each word in a text with the appropriate grammatical classification by using a set of tags [1,2,3]

  • This process is a critical step for many natural language processing (NLP) applications and corpus linguistics, and it is seen as one of the initial procedures that directly influence the performance of successive text processing steps [1,4]

  • The results showed that MADAMIRA performed very slightly lower than MADA with respect to tagging Modern Standard Arabic (MSA) texts (95.9% as opposed to 96.1%), it presented half a percentage better performance than MADA for Egyptian dialect (EGY) (92.4% as opposed to 91.8%)

Read more

Summary

Introduction

Part of Speech (POS) tagging is the process of assigning each word in a text with the appropriate grammatical classification by using a set of tags [1,2,3] This process is a critical step for many natural language processing (NLP) applications and corpus linguistics, and it is seen as one of the initial procedures that directly influence the performance of successive text processing steps [1,4]. This sort of tagging is valuable for corpus linguistics as it helps with issues of disambiguation related to word categories and allows for more focused search results [5]. Each tagger has its own tag set which is an essential element for any POS tagger

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.