A query expansion method based on topic modeling and DBpedia features

Sarah Dahir,Abderrahim El Qadi

doi:10.1016/j.jjimei.2021.100043

Abstract

Query Expansion (QE) is a method used for improving Information Retrieval (IR) by adding the terms that are almost selected from feedback documents, and similar to the user query terms. But, due to the very small average number of query keywords; it is sometimes difficult to detect the context around the user query, and expand the query accordingly, especially when it contains ambiguous terms(i.e. polysemy terms). To this end, Linked Open Data (LOD) sources may be exploited. Yet, most attributes from linked data are multi-valued which makes a system unable to determine the right one(s) to use for expansion. And few other attributes are single-valued but too long and noisy to use directly. To deal with the previous issues, integration of the topic modeling process has been proposed to predict the latent semantic attribute-topics to use for expansion. This approach reconstructs candidate documents for a given query using distribution technique Bose-Einstein statistics (Bo1) and DBpedia attributes. The Latent Dirichlet Allocation(LDA) based topic models are then generated by considering these documents and the relevant expansion terms are then determined. The proposed method has been evaluated using the AP dataset collection, and the experiments revealed significant improvements according to the retrieval results using the distribution technique Bo1. Also, the proposed “LDA-LinkedBo1” approach outperformed DBpedia association based approaches in terms of MRR@N.

Full Text