Abstract

This article presents a new query expansion (QE) method aiming to tackle term mismatch in information retrieval (IR). Previous research showed that selecting good expansion terms which do not hurt retrieval effectiveness remains an open and challenging research question. Our method investigates how global statistics of term co-occurrence can be used effectively to enhance expansion term selection and reweighting. Indeed, we build a co-occurrence graph using a context window approach over the entire collection, thus adopting a global QE approach. Then, we employ a semantic similarity measure inspired by the Okapi BM25 model, which allows to evaluate the discriminative power of words and to select relevant expansion terms based on their similarity to the query as a whole. The proposed method includes a reweighting step where selected terms are assigned weights according to their relevance to the query. What’s more, our method does not require matrix factorisation or complex text mining processes. It only requires simple co-occurrence statistics about terms, which reduces complexity and insures scalability. Finally, it has two free parameters that may be tuned to adapt the model to the context of a given collection and control co-occurrence normalisation. Extensive experiments on four standard datasets of English (TREC Robust04 and Washington Post) and French (CLEF2000 and CLEF2003) show that our method improves both retrieval effectiveness and robustness in terms of various evaluation metrics and outperforms competitive state-of-the-art baselines with significantly better results. We also investigate the impact of varying the number of expansion terms on retrieval results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call