Non-vocalised Arabic word classifications based on mining affixes features

Mustafa Hammad,Safaa Al Haj Saleh,Sari Awwad

doi:10.1504/ijcat.2019.10020647

Abstract

Arabic word classification is a challenging problem owing to the cursive nature of the language and modulation marks. The existing approaches are based on databases and dictionaries to classify Arabic words, which makes classification process operation slow. Therefore, this paper investigates Arabic word classification in the non-vocalised Arabic text by solely using affixes features and explores the extent to which we can rely on these features to determine Arabic word class without the need for dictionaries or word lists. The proposed approach is mainly based on affixes features and Support Vector Machine (SVM). A Fisher encoding is also applied to remove any redundancy and to preserve important information. Our approach is tested on a data set of two main classes (noun and verb) and different six noun sub-classes. The results indicate that this approach is helpful in achieving a success rate approaching 64% of the total words in the articles used in this study. The unsuccessful classification rate appears because there are no affixes in the target Arabic word or some original characters are considered as affixes.

Full Text