Abstract

In fact, most of information retrieval systems retrieve documents based on keywords matching, which are certainly fail at retrieving documents that have similar meaning with syntactical different keywords (form). One of the well-known approaches to overcome this limitation is query expansion (QE). There are several approaches in query expansion field such as statistical approach. This approach depends on term frequency to generate expansion features; nevertheless it does not consider meaning or term dependency. In addition, there are other approaches such as semantic approach which depends on a knowledge base that has a limited number of terms and relations. In this paper, researchers propose a hybrid approach for query expansion which utilizes both statistical and semantic approach. To select the optimal terms for query expansion, researchers propose an effective weighting method based on particle swarm optimization (PSO). A system prototype was implemented as a proof-of-concept, and its accuracy was evaluated. The experimental was carried out based on real dataset. The experimental results confirm that the proposed approach enhances the accuracy of query expansion.

Highlights

  • Information retrieval (IR) is an active research field that aims at extraction of the most relevant documents from large datasets

  • Some Arabic stop words lists are available in different studies such as [4], none of them has shown efficiency in Arabic information retrieval

  • 40 query documents (QDocs) were designed manually by an expert of Arabic language to verify the correctness of our approach

Read more

Summary

Introduction

Information retrieval (IR) is an active research field that aims at extraction of the most relevant documents from large datasets. Query expansion methods which are related to our approaches are presented in the following subsections. Term frequency In document retrieval theory, document and query are represented as a vector in vector space. A few years later, researchers enhanced term frequency performance by computing number of terms related to the document length. Ounis [11] studied the effect of the document length in the collection. This method is accepted due to its simplicity and efficiency, yet it ignores the order and semantic relations between terms. This limitation makes its usage undesirable to measure words similarity

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call