Abstract

Although, morphological analysis is a vital part of natural language processing applications, there are no definitive standards for evaluating and benchmarking Arabic morphological systems. This paper proposes assessment criteria for evaluating Arabic morphological systems by scrutinizing the input, output and architectural design to enables researchers to evaluate and fairly compare Arabic morphology systems. By scoring some state of the art Arabic morphological analyzers based on the proposed criteria; the accuracy scores showed that the best algorithm failed to achieve a reliable rate. Hence, this paper introduced an enhanced algorithm for resolving the inflected Arabic word, identifies its root, finds its pattern and POS tagging that will reduce the search time considerably and to free up the deficiencies identified by this assessment criteria. The proposed model uses semantic rules of the Arabic language on top of a hybrid sub-model based on two existing algorithms (Al-Khalil & IAMA rules). Based on applying the proposed assessment criteria the efficiency and speed have been enhanced where the system achieved up to 1500 words per second in small text up to 3000 words per second in larger documents

Highlights

  • Morphology in linguistics concerns with the study of the structure of words[1]

  • Morphology is a term for that branch of linguistics concerned with the forms words take in their different uses and constructions[2].Arabic is one of the languages having the characteristics that from one root the derivational and inflectional systems are able to produce a large number of words each having specific patterns and semantics[3]

  • Assessing and evaluating Arabic morphological systems depends on the input words and resulted output[12] according to a predefined criteria to measure and analyze given system in order to study its weakness and strength, trying to find an Arabic morphological analyzer free from all mistakes. we will apply these criteria on some of existing available systems; these criticism will not detract from its value and effectiveness[20]

Read more

Summary

INTRODUCTION

Morphology in linguistics concerns with the study of the structure of words[1]. In other words, morphology is a term for that branch of linguistics concerned with the forms words take in their different uses and constructions[2].Arabic is one of the languages having the characteristics that from one root the derivational and inflectional systems are able to produce a large number of words (lexical forms) each having specific patterns and semantics[3]. A shortcoming of this word-based analysis of the Arabic language is that it is sensitive to lack of data and information about Arabic words and it morphemes This is an issue of importance as aligned corpora are an expensive resource, which is not abundantly available for many language analysis levels. This is problematic for morphologically rich languages, where word stems are realized in many different surface forms, which exacerbates the hindering higher level of language analysis. We will adapt some major assessment criteria for measuring advantage or drawback of any Arabic morphological system[10]

BACKGROUND
Output
Assessment behavior
Word Tokenizer
Word Analyzer
Word Segmentation
Stem Identification
Tool Word Analysis
Arabic Nouns Analysis
Arabic Verbs Analysis
Stem Refiner
4.10 Root identification
EXPERIMENTS AND DISCUSSIONS
The stemming algorithms under evaluation
The methodology of Proposed system
The Algorithms used in Proposed system
Findings
Conclusion and Future Research
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call