Approximation and streaming algorithms for histogram construction problems

Sudipto Guha,Kyuseok Shim,Nick Koudas

doi:10.1145/1132863.1132873

Abstract

Histograms and related synopsis structures are popular techniques for approximating data distributions. These have been successful in query optimization and a variety of applications, including approximate querying, similarity searching, and data mining, to name a few. Histograms were a few of the earliest synopsis structures proposed and continue to be used widely. The histogram construction problem is to construct the best histogram restricted to a space bound that reflects the data distribution most accurately under a given error measure.The histograms are used as quick and easy estimates. Thus, a slight loss of accuracy, compared to the optimal histogram under the given error measure, can be offset by fast histogram construction algorithms. A natural question arises in this context: Can we find a fast near optimal approximation algorithm for the histogram construction problem? In this article, we give the first linear time (1+ϵ)-factor approximation algorithms (for any ϵ > 0) for a large number of histogram construction problems including the use of piecewise small degree polynomials to approximate data, workloads, etc. Several of our algorithms extend to data streams.Using synthetic and real-life data sets, we demonstrate that in many scenarios the approximate histograms are almost identical to optimal histograms in quality and are significantly faster to construct.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Approximation and streaming algorithms for histogram construction problems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Database Systems

Lead the way for us

Journal: ACM Transactions on Database Systems	Publication Date: Mar 1, 2006
Citations: 158

Similar Papers

GEOMETRIC ALGORITHMS FOR DENSITY-BASED DATA CLUSTERING
Danny Z Chen ... Michiel Smid
International Journal of Computational Geometry & Applications | VOL. 15
Danny Z Chen, et. al.Danny Z Chen ... Michiel Smid
01 Jun 2005
International Journal of Computational Geometry & Applications | VOL. 15

XWAVEOptimal and approximate Extended Wavelets for Streaming Data
S GUHA ... C KIM
-
S GUHA, et. al.S GUHA ... C KIM
01 Jan 2004
01 Jan 2004

XWAVE: Optimal and approximate Extended Wavelets for Streaming Data
Sudipto Guha ... Kyuseok Shim
Proceedings 2004 VLDB Conference | VOL. -
Sudipto Guha, et. al.Sudipto Guha ... Kyuseok Shim
01 Jan 2004
Proceedings 2004 VLDB Conference | VOL. -

REHISTRelative Error Histogram Construction Algorithms
S GUHA ... K SHIM
-
S GUHA, et. al.S GUHA ... K SHIM
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Approximation and streaming algorithms for histogram construction problems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Database Systems