A hybrid semantic query expansion approach for Arabic information retrieval

Hiba Almarwi,Ibrahim Al-Baltah,Mossa Ghurab

doi:10.1186/s40537-020-00310-z

Hiba Almarwi, Ibrahim Al-Baltah + Show 1 more

Open Access

https://doi.org/10.1186/s40537-020-00310-z

Copy DOI

Journal: Journal of Big Data	Publication Date: Jun 29, 2020
Citations: 20	License type: open-access

Affiliation: Yemenia University, Sana'a University

Abstract

In fact, most of information retrieval systems retrieve documents based on keywords matching, which are certainly fail at retrieving documents that have similar meaning with syntactical different keywords (form). One of the well-known approaches to overcome this limitation is query expansion (QE). There are several approaches in query expansion field such as statistical approach. This approach depends on term frequency to generate expansion features; nevertheless it does not consider meaning or term dependency. In addition, there are other approaches such as semantic approach which depends on a knowledge base that has a limited number of terms and relations. In this paper, researchers propose a hybrid approach for query expansion which utilizes both statistical and semantic approach. To select the optimal terms for query expansion, researchers propose an effective weighting method based on particle swarm optimization (PSO). A system prototype was implemented as a proof-of-concept, and its accuracy was evaluated. The experimental was carried out based on real dataset. The experimental results confirm that the proposed approach enhances the accuracy of query expansion.

Highlights

Information retrieval (IR) is an active research field that aims at extraction of the most relevant documents from large datasets
Some Arabic stop words lists are available in different studies such as [4], none of them has shown efficiency in Arabic information retrieval
40 query documents (QDocs) were designed manually by an expert of Arabic language to verify the correctness of our approach

Summary

Introduction

Information retrieval (IR) is an active research field that aims at extraction of the most relevant documents from large datasets. Query expansion methods which are related to our approaches are presented in the following subsections. Term frequency In document retrieval theory, document and query are represented as a vector in vector space. A few years later, researchers enhanced term frequency performance by computing number of terms related to the document length. Ounis [11] studied the effect of the document length in the collection. This method is accepted due to its simplicity and efficiency, yet it ignores the order and semantic relations between terms. This limitation makes its usage undesirable to measure words similarity

Methods

Results

Conclusion