We study lower bounds for Locality-Sensitive Hashing (LSH) in the strongest setting: point sets in {0,1} d under the Hamming distance. Recall that H is said to be an ( r , cr , p , q )-sensitive hash family if all pairs x , y ∈ {0,1} d with dist( x , y ) ≤ r have probability at least p of collision under a randomly chosen h ∈ H, whereas all pairs x , y ∈ {0, 1} d with dist( x , y ) ≥ cr have probability at most q of collision. Typically, one considers d → ∞, with c > 1 fixed and q bounded away from 0. For its applications to approximate nearest-neighbor search in high dimensions, the quality of an LSH family H is governed by how small its ρ parameter ρ = ln(1/ p )/ln(1/ q ) is as a function of the parameter c . The seminal paper of Indyk and Motwani [1998] showed that for each c ≥ 1, the extremely simple family H = { x ↦ x i : i ∈ [ d ]} achieves ρ ≤ 1/ c . The only known lower bound, due to Motwani et al. [2007], is that ρ must be at least ( e 1/c - 1)/( e 1/c + 1) ≥ .46/ c (minus o d (1)). The contribution of this article is twofold. (1) We show the “optimal” lower bound for ρ : it must be at least 1/ c (minus o d (1)). Our proof is very simple, following almost immediately from the observation that the noise stability of a boolean function at time t is a log-convex function of t . (2) We raise and discuss the following issue: neither the application of LSH to nearest-neighbor search nor the known LSH lower bounds hold as stated if the q parameter is tiny. Here, “tiny” means q = 2 -Θ(d) , a parameter range we believe is natural.
Read full abstract