Abstract
In the turnstile lp hitters problem with parameter e, one must maintain a high-dimensional vector xeRn subject to updates of the form update (i,Δ) causing the change xi≤ ← xi + Δ, where ie[n], ΔeR. Upon receiving a query, the goal is to report every heavy ie[n] with |xi| ≥e|x|p as part of a list L⊆[n] of size O(1/ep), i.e. proportional to the maximum possible number of hitters. For any pe(0,2] the COUNTSKETCH of [CCFC04] solves lp hitters using O(e-plog n) words of space with O(log n) update time, O(nlog n) query time to output L, and whose output after any query is correct with high probability (whp) 1 - 1/poly(n) [JST11, Section 4.4]. This space bound is optimal even in the strict turnstile model [JST11] in which it is promised that xi ≥ 0 for all ie[n] at all points in the stream, but unfortunately the query time is very slow. To remedy this, the work [CM05] proposed the dyadic trick for the COUNTMIN sketch for p = 1 in the strict turnstile model, which to maintain whp correctness achieves suboptimal space O(e-1log2 n), worse update time O(log2 n), but much better query time O(e-1poly(log n)). An extension to all pe(0,2] appears in [KNPW11, Theorem 1], and can be obtained from [Pag13]. We show that this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, EXPANDERSKETCH, which in the most general turnstile model achieves optimal O(e-plog n) space, O(log n) update time, and fast O(e-ppoly(log n)) query time, providing correctness whp. In fact, a simpler version of our algorithm for p = 1 in the strict turnstile model answers queries even faster than the dyadic trick by roughly a log n factor, dominating it in all regards. Our main innovation is an efficient reduction from the hitters to a problem in which each hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every hitter must be found, correctness requires that every cluster be found. We thus need a clustering algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our cluster-preserving may be of broader interest much beyond hitters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.