Clusters, language models, and ad hoc information retrieval

Oren Kurland,Lillian Lee

doi:10.1145/1508850.1508851

Abstract

The language-modeling approach to information retrieval provides an effective statistical framework for tackling various problems and often achieves impressive empirical performance. However, most previous work on language models for information retrieval focused on document-specific characteristics, and therefore did not take into account the structure of the surrounding corpus, a potentially rich source of additional information. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in terms of mean average precision (MAP) and recall, and our new interpolation algorithm posts statistically significant performance improvements for both metrics over all six corpora tested. An important aspect of our work is the way we model corpus structure. In contrast to most previous work on cluster-based retrieval that partitions the corpus, we demonstrate the effectiveness of a simple strategy based on a nearest-neighbors approach that produces overlapping clusters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clusters, language models, and ad hoc information retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems

Lead the way for us

Journal: ACM Transactions on Information Systems	Publication Date: May 1, 2009
Citations: 24

Similar Papers

Semantic Annotation for Context-Aware Information Retrieval for Supporting the Environmental Review of Transportation Projects
Xuan Lv ... Nora M El-Gohary
-
Xuan Lv, et. al.Xuan Lv ... Nora M El-Gohary
16 Jun 2015
16 Jun 2015

Improving the effectiveness of language modeling approaches to information retrieval
Yuanhua Lv
ACM SIGIR Forum | VOL. 46
Yuanhua LvYuanhua Lv
21 Dec 2012
ACM SIGIR Forum | VOL. 46

A general language model for information retrieval
Fei Song ... W Bruce Croft
-
Fei Song, et. al.Fei Song ... W Bruce Croft
01 Nov 1999
01 Nov 1999

Visual Saliency Fusion Based Multi-feature for Semantic Image Retrieval
Jianan Chen ... Shengyong Chen
-
Jianan Chen, et. al.Jianan Chen ... Shengyong Chen
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clusters, language models, and ad hoc information retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems