Distributed top-k aggregation queries at large

Thomas Neumann,Ralf Schenkel,Peter Triantafillou,Sebastian Michel,Matthias Bender,Gerhard Weikum

doi:10.1007/s10619-009-7041-z

Abstract

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Distributed and Parallel Databases	Publication Date: Jun 18, 2009
Citations: 63	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Distributed top-k aggregation queries at large

Abstract

Talk to us

Similar Papers

More From: Distributed and Parallel Databases

Lead the way for us

Similar Papers

Optimizing Distributed Top-k Queries
Thomas Neumann ... Gerhard Weikum
-
Thomas Neumann, et. al.Thomas Neumann ... Gerhard Weikum
01 Jan 2008
01 Jan 2008

Efficient framework for processing top-k queries with replication in mobile ad hoc networks
Yuya Sasaki ... Yoshiharu Ishikawa
GeoInformatica | VOL. 23
Yuya Sasaki, et. al.Yuya Sasaki ... Yoshiharu Ishikawa
14 May 2019
GeoInformatica | VOL. 23

Top-k Query Processing with Replication Strategy in Mobile Ad Hoc Networks
Yuya Sasaki ... Takahiro Hara
-
Yuya Sasaki, et. al.Yuya Sasaki ... Takahiro Hara
01 Jun 2018
01 Jun 2018

Top-K Aggregate Queries on Continuous Probabilistic Datasets
Jianwen Chen ... Jun Zhang
-
Jianwen Chen, et. al.Jianwen Chen ... Jun Zhang
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed top-k aggregation queries at large

Abstract

Talk to us

Similar Papers

More From: Distributed and Parallel Databases