This project looks at Arabic word generation from a computational angle. It focuses on the computational production and analysis of morphological Arabic nouns. The work begins with a stem-based descriptive analysis of Arabic noun morphology that fulfills both the computational formalization and the linguistic description. There includes a thorough discussion of both inflectional and derivational systems. The spelling of Arabic nouns is also covered, as well as morphotactics and morphophonemics. The work then offers a computer implementation of Arabic nouns built on a rule-based computational morphological methodology. The overall system is constructed using the NooJ toolkit, which supports both pushdown automata and finite-state automata (FSA) (PDA). Three elements make up the morphological generation and analysis system: a lexicon, morphotactics, and rules. The lexicon component catalogs lexical elements (indivisible words and affixes), the morphotactics component specifies ordering restrictions for morphemes, and the rules component converts lexical representations into surface representations and vice versa. Other rules, such as orthographic, morphophonemic, and morphological rules, are also stored as two-level rules. The core editable lexicon of lemmas used as input by the system is drawn from three sources: the Buckwalter Arabic morphological analyzer lexicon, the Arramooz machine-readable dictionary, and the Alghani Azzahir dictionary. A complete annotated vocabulary of inflected noun forms (combined into a single type of finite-state transducers (FSTs)) is the system's output. The lexicon that was developed is then put to use in morphological analysis. The study then offers the system's evaluation. Accuracy, precision, and recall are three widely used metrics to assess the system's performance. Two empirical experiments will be conducted as part of the evaluation task. The system analyzing Arabic words that have been discredited morphologically is evaluated in the first experiment. Accuracy, precision, and recall for the system when employing discredited Arabic words are (90.4%), (98.3%), and (88.9%), respectively. The technique is tested in a second experiment using undiacritical words. The achieved outcomes of this experiment were (94.7%) accuracy, (96.7%) precision, and (91.6% ) recall, respectively. Additionally, the measurement average for the two tests has been determined. The average performance values are respectively (92.55%), (97.5%), and (90.25) percent in terms of recall, precision, and accuracy. Overall, the results are encouraging and demonstrate the system's propensity for dealing with both diacritically and undiacritically written Arabic texts. This system can analyze Arabic text corpora in-depth and tag nouns according to their morphological characteristics. It breaks the word under analysis into three pieces (the stem, proclitics/prefixes, and suffixes/enclitics) and assigns each one a specific morphological feature tag or possibly many tags if the portion in question has numerous clitics or affixes. Many applications of natural language processing, including parsing, lemmatization, stemming, part-of-speech (POS) tagging, corpus annotations, word sense disambiguation, machine translation, information retrieval, text generation, spelling checkers, etc., depend on computational morphology. It is made up of morphological generation and analysis paradigms. According to a set of features, morphological generation attempts to construct every feasible derived and inflected form of a given lemma. On the other hand, morphological analysis is the process of dissecting a word into its component morphemes and giving each morpheme linguistic tags or qualities.
Read full abstract