Linear hash functions

Noga Alon,Martin Dietzfelbinger,Gábor Tardos,Erez Petrank,Peter Bro Miltersen

doi:10.1145/324133.324179

Abstract

Consider the set ℋ of all linear (or affine) transformations between two vector spaces over a finite field F . We study how good ℋ is as a class of hash functions, namely we consider hashing a set S of size n into a range having the same cardinality n by a randomly chosen function from ℋ and look at the expected size of the largest hash bucket. ℋ is a universal class of hash functions for any finite field, but with respect to our measure different fields behave differently. If the finite field F has n elements, then there is a bad set S ⊂ F 2 of size n with expected maximal bucket size Ω( n 1/3 ). If n is a perfect square, then there is even a bad set with largest bucket size always at least √n. (This is worst possible, since with respect to a universal class of hash functions every set of size n has expected largest bucket size below √ + 1/2.) If, however, we consider the field of two elements, then we get much better bounds. The best previously known upper bound on the expected size of the largest bucket for this class was O (2 √ log n ). We reduce this upper bound to O (log n log log n ). Note that this is not far from the guarantee for a random function. There, the average largest bucket would be Θ (log n / log log n ). In the course of our proof we develop a tool which may be of independent interest. Suppose we have a subset S of a vector space D over Z 2 , and consider a random linear mapping of D to a smaller vector space R . If the cardinality of S is larger than c ε | R |log| R |, then with probability 1 - ϵ, the image of S will cover all elements in the range.

Full Text