Stemmer Impact on Quranic Mobile Information Retrieval Performance

Huda Omar,Mohammed Dahab,Mahmoud Kamal

doi:10.14569/ijacsa.2016.071218

Abstract

Stemming algorithms are employed in information retrieval (IR) to reduce verity variants of the same word with several endings to a standard stem. Stemmers can also help IR systems by unifying vocabulary, reducing term variants, reducing storage space, and increasing the likelihood of matching documents, all of which make stemming very attractive for use in IR. This paper aims to study the impact of using stemming techniques in mobile effectiveness. Two-word extraction stemming techniques will be used: a light stemmer and a dictionary-lookup stemmer. Also, three sets of experiments were conducted in this research in order to raise the efficiency of mobile aapplications. Implementing the two stemming approaches and assessing their accuracy by calculating the precision, recall, MAP, and f-measure, produced results which show that the light10 stemmer outperforms the dictionary-lookup stemmer in precision and MAP. Furthermore, the mobile performance of the light10 stemmer exceeds that of the dictionary-based stemmer.

Highlights

The Holy Quran is a global source of knowledge for humanity in general and Muslims in particular
Since the Holy Quran is the divine revelation and the word of God, it needs careful handling when processed by automated methods of natural language processing (NLP).The Holy Quran is written in the Arabic language, which is known to be one of the more challenging natural languages in the field [1]
To study the impact of different stemming approaches on the accuracy of Quranic information retrieval (IR), dictionary-lookup and light10 stemmers were used in the mobile application

Summary

Introduction

The Holy Quran is a global source of knowledge for humanity in general and Muslims in particular. Studying and learning the Holy Quran plays a central role in the lives of all Muslims. Since the Holy Quran is the divine revelation and the word of God, it needs careful handling when processed by automated methods of natural language processing (NLP).The Holy Quran is written in the Arabic language, which is known to be one of the more challenging natural languages in the field [1]. Most researchers have been interested in the development of search techniques for the Quranic text. The techniques employed to retrieve information from the Quran can be classified into two types: semantic-based and lexicalbased. The lexical-based search yields results according to the morphological analysis for a query

Objectives

Methods

Results

Conclusion