Beating Fredman-Komlós for perfect k-hashing

Venkatesan Guruswami,Andrii Riazanov

doi:10.1016/j.jcta.2021.105580

Venkatesan Guruswami, Andrii Riazanov

Open Access

https://doi.org/10.1016/j.jcta.2021.105580

Copy DOI

Journal: Journal of Combinatorial Theory, Series A	Publication Date: Jan 17, 2022
Citations: 5	License type: publisher-specific-oa

Affiliation: Carnegie Mellon University

Abstract

We say a subset C⊆{1,2,…,k}n is a k-hash code (also called k-separated) if for every subset of k codewords from C, there exists a coordinate where all these codewords have distinct values. Understanding the largest possible rate (in bits), defined as (log2⁡|C|)/n, of a k-hash code is a classical problem. It arises in two equivalent contexts: (i) the smallest size possible for a perfect hash family that maps a universe of N elements into {1,2,…,k}, and (ii) the zero-error capacity for decoding with lists of size less than k for a certain combinatorial channel.A general upper bound of k!/kk−1 on the rate of a k-hash code (in the limit of large n) was obtained by Fredman and Komlós in 1984 for any k≥4. While better bounds have been obtained for k=4, their original bound has remained the best known for each k≥5. In this work, we present a method to obtain the first improvement to the Fredman-Komlós bound for every k≥5.

Highlights

A code of length n over an alphabet of size k is a subset C ⊆ {1, 2, . . . , k}n
The study of the quantity Rk is a fundamental problem in combinatorics, information theory, and computer science
An upper bound on Rk is equivalent to a lower bound on the size of a perfect k-hash family as a function of the universe size

Summary

Introduction

A code of length n over an alphabet of size k is a subset C ⊆ {1, 2, . . . , k}n. We say such a code C is a k-hash code ( called k-separated in the literature), if for every subset of k distinct codewords {c(1), c(2), . . . , c(k)} from C, there exists a coordinate j such that all these codewords differ in this coordinate, i.e. {c(j1), c(j2), . . . , c(jk)} = {1, 2, . . . , k}. Rk gives the growth rate of the size of universes for which perfect k-hash families of a given size exist. Studying the rates of the codes and hashing family sizes in the above settings is a longstanding problem. For k = 3 (which is called the trifference problem by Körner), R3 ≤ log2(3/2) ≈ 0.585 remains the best upper bound, and improving it (or showing it can be achieved!) is a major combinatorial challenge. Using exactly the same arguments, we obtain an improvement on the Körner-Marton upper bound [9] on the rate of such codes. For some small pairs of values (b, k) with b > k, the Körner-Marton bound was further improved by Arikan [1] In those cases, the bounds we get are probably weaker than Arikan’s.

Background and approach

Upper bound on the rate of k-hash codes

Unbalanced case

Almost balanced case

Improvement of the Fredman-Komlós bound