Stemming methodologies over individual query words for an Arabic Information Retrieval System

Hani Abu‐Salem,Martha W Evens,Mahmoud Al‐Omari

doi:10.1002/(sici)1097-4571(1999)50:6<524::aid-asi7>3.3.co;2-d

Abstract

Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic Information Retrieval System (Arabic-IRS) by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mixed Stemming, computes term importance using a weighting scheme that uses the Term Frequency (TF) and the Inverse Document-Frequency (IDF), called TFxIDF. An extended version of the Arabic-IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the Binary scheme and the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme.

Full Text