Abstract

Query Expansion (QE) approaches that involve the reformulation of queries by adding new terms to the initial user query, are intended to ameliorate the vocabulary mismatch between the query keywords and the documents’ in Information Retrieval Systems (IRS). One big issue in QE is the selection of the right candidate terms for expansion. For this purpose Linked Data can be used, as a valuable resource, for providing additional expansion features such as the values of sub- and super classes of resources. The underlying research question is whether interlinked data and vocabulary items provide features which can be taken into account for query expansion. In this paper, we introduced a new QE approach that aimed at improving IRS by using the well-known distribution based method Bose-Einstein statistics (Bo1) as well as Linked Data from the knowledge base DBpedia using different numbers of expansion terms. We evaluated the effectiveness of each method individually as well as their combinations using two Text REtrieval Conference (TREC) test collections. Our approach has lead to significant improvement in terms of precision, recall, Mean Average Precision (MAP) at rank 10, and normalized Discounted Cumulative Gain (nDCG) at different ranks compared to Pseudo Relevance Feedback (PRF) that we used as a baseline. The results show that the inclusion of semantic annotations clearly improves the retrieval performance over the baseline method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call