Adaptive query-based sampling for distributed IR

Leif Azzopardi,Mark Baillie,Fabio Crestani

doi:10.1145/1148170.1148277

Abstract

In Distributed Information Retrieval systems (DIR), the widely accepted solution for resource description acquisition is Query-Based Sampling (QBS) [1]. In the standard approach to QBS, once 300-500 unique documents have been retrieved sampling is curtailed. This threshold was obtained by empirically measuring the estimated resource description against the actual resource, and then considering the corresponding retrieval selection accuracy [1]. However, a fixed threshold may not generalise to other collections and environments beyond that which it was estimated on (i.e. a set of resources of uniform size [1]). Cases when the blanket application of such a heuristic would be inappropriate include (1) when the sizes of resource are highly skewed and (2) when the resources are very heterogenous. In the former, if a resource is very large then undersampling will occur because not enough documents were obtained. Conversely, if a collection is very small in size, then oversampling will occur increasing costs beyond necessity. In the later case, if the resource is varied and highly heterogeneous, then to obtain a sufficiently accurate description would require more documents to be sampled than when resources are homogenous. Either way, adopting a flat cut off will not necessarily provide sufficiently good resource descriptions for all resources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive query-based sampling for distributed IR

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An evaluation of resource description quality measures
Mark Baillie ... Leif Azzopardi
-
Mark Baillie, et. al.Mark Baillie ... Leif Azzopardi
23 Apr 2006
23 Apr 2006

Adaptive Query-Based Sampling of Distributed Collections
Mark Baillie ... Leif Azzopardi
-
Mark Baillie, et. al.Mark Baillie ... Leif Azzopardi
01 Jan 2006
01 Jan 2006

Exploiting Social Annotations to Generate Resource Descriptions in a Distributed Environment: Cooperative Multi-Agent Simulation on Query-Based Sampling
Zakaria Saoud ... Antoine Doucet
The Review of Socionetwork Strategies | VOL. 11
Zakaria Saoud, et. al.Zakaria Saoud ... Antoine Doucet
01 Jun 2017
The Review of Socionetwork Strategies | VOL. 11

Query-based sampling of text databases
Jamie Callan ... Margaret Connell
ACM Transactions on Information Systems | VOL. 19
Jamie Callan, et. al.Jamie Callan ... Margaret Connell
01 Apr 2001
ACM Transactions on Information Systems | VOL. 19

Publication Date: Aug 6, 2006
Citations: 17	License type: gpl

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive query-based sampling for distributed IR

Abstract

Talk to us

Similar Papers