Arabic natural language processing “NLP” researchers have not yet reached a consensus on a unified definition of stop-words, which is a challenging issue and a limitation in the domain of Arabic NLP in general and Arabic morphological analysis in particular. In this research work, we start by giving a detailed definition and classification of solid-stem words which renames stop-words; we then propose a linguistic-based morphological analysis approach to process this class of words in the Arabic language. A solid-stem word has a unique morphological form and is characterized by the fact that the inflectional ending of the word does not change no matter where the word is in the sentence. A solid-stem word can be a constructed noun (pronoun, indeclinable noun, and verbal noun), an invariable verb, or a particle. A new classification of solid-stems is given, based on their type of affixations. The proposed approach distinguishes between variable and invariable solid-stem types, identifies all possible affixes, and generates all possible morphological variants of each word in a systematic way. For this purpose, we propose a formula for solid-stem affixation and describe in detail affixes schemas using finite-state machine. Building the work on strong linguistic bases lays the foundation for building efficient Arabic search engines and special-purpose information retrieval systems.
Read full abstract