Abstract

According to the desired level of analyzing words, Arabic stemming algorithms can be classified into stem-based (light stemming algorithms), and root-based algorithms. Light stemming algorithms only remove prefixes and suffixes from the words, while root-based algorithms remove prefixes, suffixes and infixes. There are several light stemmers for Arabic (Light1, Light2, Light3, Light8, and Light10), For retrieval information Light10 stemmer is out-performed the other light stemmers. In this paper, Arabic stemming algorithms are studied. And, literature review of Arabic stemmers is discussed. In addition, a new Arabic light stemmer was proposed and Implemented. The main step of the light stemmer is removing the prefixes and suffixes of the words. And because this step causes changing of the meaning of some words, many other steps are designed and implemented in the proposed stemmer. The proposed stemmer and Light10 stemmer were tested on the same Arabic data which is developed in this work. The accuracy rate of Light10 stemmer was 66%, while the accuracy rate of the proposed stemmer was 88.25 %. The reasons for incorrect stemming of the proposed stemmer are mentioned.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call