Sampling issues in parallel database systems

S Seshadri,Jeffrey F Naughton

doi:10.1007/bfb0032440

Abstract

Sampling has proven useful in database systems in applications including query size estimation, and most recently, probabilistic parallel query evaluation algorithms. In order to apply the full power of modern multiprocessor database systems, sampling techniques must (1) distribute the sampling workload evenly among the processors in the system, and (2) make use of all the data on the pages brought into main memory during the course of the sampling. In this paper we show how to achieve these two goals by proving that for query size estimation, (1) stratified random sampling guarantees perfect load balancing without reducing the accuracy of the estimate, and that (2) for a given number of I/O operations, page level sampling always produces a more accurate estimate than tuple level sampling. For probabilistic parallel query evaluation algorithms, high performance requires tight boundsxon the expected skew in the allocation of work to processors as a function of the number of samples. Toward this end we prove a new bound on this skew, and show that our new bound is better than previously known bounds.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sampling issues in parallel database systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Parallel Database Techniques

Scalable Computing Practice and Experience | VOL. 4

03 Jan 2001
Scalable Computing Practice and Experience | VOL. 4

Dynamic action scheduling in a parallel database system
Paul W P J Grefen ... Peter M G Apers
-
Paul W P J Grefen, et. al.Paul W P J Grefen ... Peter M G Apers
01 Jan 1992
01 Jan 1992

Heuristic optimization of speedup and benefit/cost for parallel database scans on shared-memory multiprocessors
M Rys ... G Weikum
-
M Rys, et. al.M Rys ... G Weikum
01 Apr 1994
01 Apr 1994

Uniform partitioning of relations using histogram equalization framework: An efficient parallel hash-based join
Ung Kyu Park ... Tag Gon Kim
Information Processing Letters | VOL. 55
Ung Kyu Park, et. al. Ung Kyu Park ... Tag Gon Kim
01 Sep 1995
Information Processing Letters | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sampling issues in parallel database systems

Abstract

Talk to us

Similar Papers