Abstract

Stemming is used to produce stem or root of words. The process is vital to different research fields such as text mining, sentiment analysis, and text categorization, etc. Several techniques have been proposed to stemming Arabic text and among them, Khoja and light-10 stemmers are the most widely used. In this paper, we propose and evaluate two different stemming techniques to Arabic that are based on light stemming techniques. The new stemmers are compared to best reported light stemmer, which is light-10. Results and experiments, which were conducted using standard collections, reveal that The proposed stemmers yield 5.13% and 13.1% improvement in retrieval performance over light 10 with 0.369 average precision and 0.397, respectively and the improvement is statistically significant.

Highlights

  • Arabic Language is the largest group of Semitic languages

  • Extended-Light Linguistic Stemmer proposed stemmers. This relatively worst performance was caused by the fact that light 10 does clustering words with the same meaning to different conflation classes, the language, meaning Arabic, conflates many words from a single stem or a single verb

  • In terms of average precision, the proposed Extended-light stemmer (ExtendedS run) yields 5.13% in the retrieval performance over light 10 and it performed significantly better than baseline

Read more

Summary

Introduction

Arabic Language is the largest group of Semitic languages. It is the native language for more than four hundred millions [1] centered in the Arabic region, which includes North Africa and Middle East countries. Arabic language is written from left to right and its script has 28 letters. Unlike the popular Semitic languages, words are often written in a cursive (non-concatenative), rather than discontinuous, longhand style [2] but with spaces to delimit words from each others. As a result for this cursive style each Arabic letter can be written in different glyphs according to its position in words, e.g.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call