Abstract

Frequency estimation, also known as the Point Query problem, is one of the most fundamental problems in streaming algorithms. Given a stream S of elements from some universe U = {1 … n}, the goal is to compute, in a single pass, a short “sketch” of S so that for any element i ∊ U, one can estimate the number xi of times i occurs in S based on the sketch alone. Two state of the art solutions to this problems are Count-Min and Count-Sketch algorithms. They are based on linear sketches, which means that the data elements can be deleted as well as inserted and sketches for two different streams can be combined via addition. However, the guarantees offered by Count-Min and Count-Sketch are incomparable. The frequency estimator x produced by Count-Min sketch, using O(1/∊·log n) dimensions, guarantees that with high probability, and holds deterministically. Also, Count-Min works under the assumption that x ≥ 0. On the other hand, Count-Sketch, using O(1/∊2 · log n) dimensions, guarantees that with high probability. A natural question is whether it is possible to design the “best of both worlds” sketching method, with error guarantees depending on the ℓ2 norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. Specifically, we show that: Any linear sketch satisfying the ℓp norm error guarantee with probability at least 2/3 and having the no-underestimation property must be of dimension of at least Ω(n1–1/p/∊), even if the sketched vectors are non-negative. This bound is tight, as we also give a linear sketch of dimension O(n1–1/p/∊) satisfying these properties. Any streaming algorithm satisfying the ℓp norm error guarantee with probability at least 2/3 and having the no-underestimation property must use at least Ω(n1–1/p/∊) bits. This holds even for algorithms that only allow insertions and make any constant number of passes over the stream. This bound is tight up to a logarithmic factor. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., should hold always. We show that any linear sketch satisfying this property and having the ℓp error guarantee with probability at least 2/3 must be of dimension at least Ω(n1–1/p/∊). We also show that this bound is tight up to polylogarithmic factors, by providing an appropriate linear sketch.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call