Abstract

Stemming is the main step used for handling the morphologically rich languages such as Arabic. It is usually used in several fields such as Natural Language Processing, Information Retrieval (IR), and Text Mining. The goal of stemming is reducing inflected or derived words to their base (root or stem), from a generally written word form. Considering that Arabic is mainly dependent on roots and patterns to generate words, a new efficient heavy/light stemmer is developed based on the interaction between roots and patterns; yet, rich linguistic resources are involved. This stemmer provides three different outputs: individual root, a stem, and a combination of stem/root. In this paper, we highlight the performance of the developed stemmer via various experiments on both Modern Standard Arabic and Classical Arabic. In fact, the achieved accuracies are 96.93% and 96.56% for respectively the Quranic corpus Al-Mus'haf and NEMLAR corpus. In the context of usability testing, the effectiveness of the stemmer on IR and Part of Speech (PoS) tagging are studied. The obtained results indicate an improvement in PoS tagging by 10.98% and by 14.12% in search efficiency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.