Abstract

Summary form only given. We present a new stemming algorithm to extract quadri-literal Arabic roots. The algorithm starts by excluding the prefixes and checks then the word characters starting from the last letter backward to the first one. A temporary matrix is used to store the suffix letters of the Arabic word, and another matrix is used to store the roots. The partition process is preceded by removing the particle from the source word. Checking the letters of any word includes checking whether the tested letter is included within the general standard Arabic word; if the test is positive then the letter will be stored in the temporary matrix, otherwise it will be stored in the root matrix. Mutation of some of the original letters in the word to be derived is used in some cases in order to store the substitute letters in the root matrix. Finally, the letters in the root matrix are arranged according to their order in the original word. The algorithm has been tested on a sample of 200 words generated randomly and descendant from quadri-literal Arabic verbs. It has shown a high performance reached 95% of accuracy rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call