Abstract

As known in the literature, light stemmers produce more under-stemming errors, while root stemmers produce more over-stemming errors. In this investigation, we deal with the Arabic light stemming problem, where we propose an improvement to ARLSTem algorithm (i.e. ARLSTem v1.1). In particular, we introduce new rules to correct some under-stemming errors produced by ARLSTem. In addition, we compare the new version of ARLSTem with five existing stemming algorithms using ARASTEM corpus. The latter has been corrected, where we have found some errors in seven samples. The experimental results showed that ARLSTem v1.1 outperforms the other existing algorithms in terms of under-stemming and over-stemming errors. Moreover, it presents interesting performances in the Arabic text categorization task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call