This research presents a new algorithm for Arabic root extraction, which aims to improve the accuracy of Arabic Natural Language Processing Algorithms by addressing the weaknesses and errors of existing algorithms. The proposed algorithm utilizes a database, that includes a collection of roots, patterns, and affixes, to generate potential derivation roots of a word without eliminating affixes initially. By matching the derived word with patterns to identify potential roots, the proposed algorithm avoids the inaccuracies caused by eliminating affixes based on expectation methods. The study conducted a comparison of the proposed algorithm with three commonly used Arabic root extraction algorithms. The evaluation process is performed on three corpora. Results showed that the proposed algorithm achieved an average accuracy rate of 96%, which is significantly higher than the others.
Read full abstract