Hybrid Part-Of-Speech Tagger for Non-Vocalized Arabic Text

Meryeme Hadni,Said Alaoui Ouatik,Abdelmonaime Lachkar,Mohammed Meknassi

doi:10.5121/ijnlc.2013.2601

Meryeme Hadni, Said Alaoui Ouatik + Show 2 more

Open Access

https://doi.org/10.5121/ijnlc.2013.2601

Copy DOI

Abstract

Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper proposes an efficient and accurate POS Tagging technique for Arabic language using hybrid approach. Due to the ambiguity issue, Arabic Rule-Based method suffers from misclassified and unanalyzed words. To overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based method. Our POS tagger generates a set of three POS tags: Noun, Verb, and Particle. The proposed technique uses the different contextual information of the words with a variety of the features which are helpful to predict the various POS classes. To evaluate its accuracy, the proposed method has been trained and tested with two corpora: the Holy Quran Corpus and Kalimat Corpus for undiacritized Classical Arabic language. The experiment results demonstrate the efficiency of our method for Arabic POS Tagging. In fact, the obtained accuracies rates are 97.6%, 96.8% and 94.4% for respectively our Hybrid Tagger, HMM Tagger and for the Rule-Based Tagger with Holy Quran Corpus. And for Kalimat Corpus we obtained 94.60%, 97.40% and 98% for respectively Rule-Based Tagger, HMM Tagger and our Hybrid Tagger.

Highlights

Part-Of-Speech (POS) tagging is known as a necessary work in many areas Natural Language Processing (NLP) systems like information extraction, parsing of text and semantic processing
A suitable architecture of the Hidden Markov Model (HMM) model was specified based-on the structure of sentence that allows us to deal correctly the ambiguity related to the misclassified and unanalyzed word in Arabic Rule-Based method
Two corpus composed of traditional texts of classical Arabic (CA) was used, the Quranic Arabic Corpus and the Kalimat Corpus

Summary

Introduction

Part-Of-Speech (POS) tagging is known as a necessary work in many areas Natural Language Processing (NLP) systems like information extraction, parsing of text and semantic processing. The POS tagging is known as assigning grammatical tags to words and symbols making a text which include a large amount of lexical information and captures the relationship between these words and their adjacent related words in a sentence, or paragraph [1][2][3]. Arabic POS Tagging is the process of identifying lexical category of the Arabic word existing in a sentence based on its context [5]. The most used categories are noun, adverb, verb and adjective. This is done on the basis of words role, both individually as well as in the sentence. Take for example the term " ‫"ذه‬, it can be treated as a noun "gold" or a verb "go"

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal on Natural Language Computing	Publication Date: Dec 31, 2013
Citations: 32	License type: cc-by

R Discovery Prime

R Discovery Prime

Hybrid Part-Of-Speech Tagger for Non-Vocalized Arabic Text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing

Lead the way for us

Similar Papers

A hybrid part-of-speech tagger with annotated Kurdish corpus: advancements in POS tagging
Dastan Maulud ... Ismael Ali
Digital Scholarship in the Humanities | VOL. 38
Dastan Maulud, et. al.Dastan Maulud ... Ismael Ali
05 Oct 2023
Digital Scholarship in the Humanities | VOL. 38

Graph-based Natural Language Processing and Information Retrieval
Rada Mihalcea ... Dragomir Radev
-
Rada Mihalcea, et. al.Rada Mihalcea ... Dragomir Radev
11 Apr 2011
11 Apr 2011

Improving Rule-Based Method for Arabic POS Tagging Using HMM Technique
Meryeme Hadni ... Abdelmonaime Lachkar
-
Meryeme Hadni, et. al.Meryeme Hadni ... Abdelmonaime Lachkar
02 Nov 2013
02 Nov 2013

Kernel based part of speech tagger for Kannada
P J Antony ... K P Soman
-
P J Antony, et. al.P J Antony ... K P Soman
01 Jul 2010
01 Jul 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid Part-Of-Speech Tagger for Non-Vocalized Arabic Text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing