Abstract

In the era of information overloading, information retrieval systems are vital applications. Many researchers try to enhance the search results by introducing new methods. Unlike the English language, some languages like Arabic have complex morphological aspects and lack both linguistic and semantic resources. This paper proposes a language-independent semantic-based information retrieval approach, which expands the user query using bi-gram term collocations. The proposed approach has two main contributions. First, the bi-gram term collocations employed to expand the user query are automatically mined from the text corpus, therefore there is no need for an external semantic resource. Second, due to the complexity of the language morphology, the system index is constructed using the corpus words to save the cost and effort of the stemming process. A system prototype for the Arabic language was implemented and evaluated versus the stem-based method. The experimental evaluation has been conducted on the scripts of the Arabic Holy Quran. The evaluation results demonstrate that the proposed system outperforms the stem-based method in terms of precision and recall.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.