Abstract
Term proximity statistic, which consists of rewarding documents where the matched query terms occur in close proximity, has proved its effectiveness in document retrieval performance. However, this field of research remains unexplored for Arabic information retrieval (IR) despite of the non diacritical text and the rich morphology of Arabic language which complicate the retrieval process. In this paper, we propose to boost the Arabic information retrieval performance by using proximity information. Our aim is to evaluate proximity features for Arabic language in order to go beyond the bag-of-words, and to overcome the problems related to text preprocessing. We investigate several state-of-the-art proximity models, including the Cross-Term model (CRTER), the Markov Random Field model (MRF), the divergence from randomness (DFR) multinomial model, and the Positional Language Model (PLM). For preprocessing purposes, Khoja and light stemming algorithms have been used. Experiments are performed on the Arabic TREC-2001/2002 collection using Terrier IR platform. The obtained results show significant improvements by using proximity based-models for Arabic IR.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.