An access cost-aware approach for object retrieval over multiple sources

Benjamin Arai,Gautam Das,Nick Koudas,Dimitrios Gunopulos,Vagelis Hristidis

doi:10.14778/1920841.1920982

Abstract

Source and object selection and retrieval from large multi-source data sets are fundamental operations in many applications. In this paper, we initiate research on efficient source (e.g., database) and object selection algorithms on large multi-source data sets. Specifically, in order to acquire a specified number of satisfying objects with minimum cost over multiple databases, the query engine needs to determine the access overhead for individual data sources, the overhead of retrieving objects from each source, and possibly other statistics such as estimating the frequency of finding a satisfying object in order to determine how many objects to retrieve from each data source. We adopt a probabilistic approach to source selection utilizing a cost structure and a dynamic programming model for computing the optimal number of objects to retrieve from each data source. Such a structure can be a valuable asset where there is a monetary or time related cost associated with accessing large distributed databases. We present a thorough experimental evaluation to validate our techniques using real-world data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An access cost-aware approach for object retrieval over multiple sources

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Sep 1, 2010
Citations: 12

Similar Papers

Integrating domain heterogeneous data sources using decomposition aggregation queries
Jian Xu ... Rachel Pottinger
Information Systems | VOL. 39
Jian Xu, et. al.Jian Xu ... Rachel Pottinger
19 Jun 2013
Information Systems | VOL. 39

Efficient quality-driven source selection from massive data sources
Yiming Lin ... Hong Gao
Journal of Systems and Software | VOL. 118
Yiming Lin, et. al.Yiming Lin ... Hong Gao
17 May 2016
Journal of Systems and Software | VOL. 118

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Practical guidance for using multiple data sources in systematic reviews and meta-analyses (with examples from the MUDS study).
Evan Mayo‐Wilson ... Nicole Fusco
Research Synthesis Methods | VOL. 9
Evan Mayo‐Wilson, et. al.Evan Mayo‐Wilson ... Nicole Fusco
15 Dec 2017
Research Synthesis Methods | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An access cost-aware approach for object retrieval over multiple sources

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment