Abstract

This paper presents an automata-based Arabic morphology analyzer and an object-oriented data model. Arabic morphology is too complex to model exhaustively with classical approaches. Therefore, the first issue of this paper is the proposal of an adequate data model representing Arabic morphological components and related building rules. Our proposed MorphoScript model is a declarative and object-oriented language using classes, inheritance, and aggregation as basic supports to define the morphological components and all possible morphological links between them. The data model is also based on an annotation indexing system for semantic enrichment of the morphology knowledge. The other contribution of this paper is the compilation of the data model into a deterministic finite-state automaton that represents morphological knowledge. The produced AMA (Arabic Morphological Automaton) constitutes the nucleus of the final proposed morphological analyzer. As a result, the MorphoScript language allowed us to represent the morphological knowledge base in a readable and extremely optimal data model. On the other hand, the morphological automata generated from the MorphoScript database make the morphological process very fast, simple, and deterministic. Moreover, the compilation process is fully automatic, so we can update any morphological rule or component and run the compiler to automatically obtain a new version of the automaton.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call