Retrieving Information from a Distributed Heterogeneous Document Collection

Christoph Baumgarten

doi:10.1023/a:1026572910743

Abstract

This paper describes a probabilistic model for optimum information retrieval in a distributed heterogeneous environment. The model assumes the collection of documents offered by the environment to be partitioned into subcollections. Documents as well as subcollections have to be indexed, where indexing methods using different indexing vocabularies can be employed. A query provided by a user is answered in terms of a ranked list of documents. The model determines a procedure for ranking the documents that stems from the Probability Ranking Principle: For each subcollection, the subcollection's documents are rankeds the resulting ranked lists are combined into a final ranked list of documents, where the ordering is determined by the documents' probabilities of being relevant with respect to the user's query. Various probabilistic ranking methods may be involved in the distributed ranking process. A criterion for effectively limiting the ranking process to a subset of subcollections extends the model. The property that different ranking methods and indexing vocabularies can be used is important when the subcollections are heterogeneous with respect to their content. The model's applicability is experimentally confirmed. When exploiting the degrees of freedom provided by the model, experiments showed evidence that the model even outperforms comparable models for the non-distributed case with respect to retrieval effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Retrieving Information from a Distributed Heterogeneous Document Collection

Abstract

Talk to us

Similar Papers

More From: Information Retrieval

Lead the way for us

Journal: Information Retrieval	Publication Date: Jan 1, 2000
Citations: 17

Similar Papers

Document ranking with quantum probabilities
Guido Zuccon
ACM SIGIR Forum | VOL. 47
Guido ZucconGuido Zuccon
07 Jun 2012
ACM SIGIR Forum | VOL. 47

Do Interviews Really Matter in Generating Programs and Applicants' Rank Lists for the Match?
Christopher Di Felice ... Pallavi Sharma
Southern medical journal | VOL. 115
Christopher Di Felice, et. al.Christopher Di Felice ... Pallavi Sharma
01 Apr 2022
Southern medical journal | VOL. 115

Building and Using Models of Information Seeking, Search and Retrieval
Leif Azzopardi ... Guido Zuccon
-
Leif Azzopardi, et. al.Leif Azzopardi ... Guido Zuccon
09 Aug 2015
09 Aug 2015

Probabilistic models in IR and their relationships
Robin Aly ... Thomas Demeester
Information Retrieval | VOL. 17
Robin Aly, et. al.Robin Aly ... Thomas Demeester
26 Jun 2013
Information Retrieval | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Retrieving Information from a Distributed Heterogeneous Document Collection

Abstract

Talk to us

Similar Papers

More From: Information Retrieval