No Repetition

Anders Aamand,Evangelos Kipouridis,Mikkel Thorup,Jakob B T Knudsen,Debarati Das,Peter M R Rasmussen

doi:10.14778/3565838.3565851

Abstract

Stochastic sample-based estimators are among the most fundamental and universally applied tools in statistics. Such estimators are particularly important when processing huge amounts of data, where we need to be able to answer a wide range of statistical queries reliably, yet cannot afford to store the data in its full length. In many applications we need the sampling to be coordinated which is typically attained using hashing. In previous work, a common strategy to obtain reliable sample-based estimators that work within certain error bounds with high probability has been to design one that works with constant probability, and then boost the probability by taking the median over r independent repetitions. Aamand et al. (STOC'20) recently proposed a fast and practical hashing scheme with strong concentration bounds , Tabulation-1Permutation, the first of its kind. In this paper, we demonstrate that using such a hash family for the sampling, we achieve the same high probability bounds without any need for repetitions. Using the same space, this saves a factor r in time, and simplifies the overall algorithms. We validate our approach experimentally on both real and synthetic data. We compare Tabulation-1Permutation with other hash functions such as strongly universal hash functions and various other hash functions such as MurmurHash3 and BLAKE3, both with and without resorting to repetitions. We see that if we want reliability in terms of small error probabilities, then Tabulation-1Permutation is significantly faster.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

No Repetition

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Sep 1, 2022
Citations: 1

Similar Papers

Polynomial hash functions are reliable
M Dietzfelbinger ... Y Matias
-
M Dietzfelbinger, et. al.M Dietzfelbinger ... Y Matias
01 Jan 1992
01 Jan 1992

The Power of Two Choices with Simple Tabulation
Søren Dahlgaard ... Mathias Bæk Tejs Knudsen
-
Søren Dahlgaard, et. al.Søren Dahlgaard ... Mathias Bæk Tejs Knudsen
21 Dec 2015
21 Dec 2015

The power of two choices with simple tabulation
...
-
, et. al. ...
10 Jan 2016
10 Jan 2016

Security of Practical Cryptosystems Using Merkle-Damgård Hash Function in the Ideal Cipher Model
Yusuke Naito ... Kazuo Ohta
-
Yusuke Naito, et. al.Yusuke Naito ... Kazuo Ohta
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

No Repetition

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment