Abstract

Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper proposes an efficient and accurate POS Tagging technique for Arabic language using hybrid approach. Due to the ambiguity issue, Arabic Rule-Based method suffers from misclassified and unanalyzed words. To overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based method. Our POS tagger generates a set of three POS tags: Noun, Verb, and Particle. The proposed technique uses the different contextual information of the words with a variety of the features which are helpful to predict the various POS classes. To evaluate its accuracy, the proposed method has been trained and tested with two corpora: the Holy Quran Corpus and Kalimat Corpus for undiacritized Classical Arabic language. The experiment results demonstrate the efficiency of our method for Arabic POS Tagging. In fact, the obtained accuracies rates are 97.6%, 96.8% and 94.4% for respectively our Hybrid Tagger, HMM Tagger and for the Rule-Based Tagger with Holy Quran Corpus. And for Kalimat Corpus we obtained 94.60%, 97.40% and 98% for respectively Rule-Based Tagger, HMM Tagger and our Hybrid Tagger.

Highlights

  • Part-Of-Speech (POS) tagging is known as a necessary work in many areas Natural Language Processing (NLP) systems like information extraction, parsing of text and semantic processing

  • A suitable architecture of the Hidden Markov Model (HMM) model was specified based-on the structure of sentence that allows us to deal correctly the ambiguity related to the misclassified and unanalyzed word in Arabic Rule-Based method

  • Two corpus composed of traditional texts of classical Arabic (CA) was used, the Quranic Arabic Corpus and the Kalimat Corpus

Read more

Summary

Introduction

Part-Of-Speech (POS) tagging is known as a necessary work in many areas Natural Language Processing (NLP) systems like information extraction, parsing of text and semantic processing. The POS tagging is known as assigning grammatical tags to words and symbols making a text which include a large amount of lexical information and captures the relationship between these words and their adjacent related words in a sentence, or paragraph [1][2][3]. Arabic POS Tagging is the process of identifying lexical category of the Arabic word existing in a sentence based on its context [5]. The most used categories are noun, adverb, verb and adjective. This is done on the basis of words role, both individually as well as in the sentence. Take for example the term " ‫"ذه‬, it can be treated as a noun "gold" or a verb "go"

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call