Sampling a Near Neighbor in High Dimensions — Who is the Fairest of Them All?

Martin Aumüller,Rasmus Pagh,Sepideh Mahabadi,Francesco Silvestri,Sariel Har-Peled

doi:10.1145/3502867

Abstract

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the r-near neighbor ( r -NN) problem asks for a data structure that, given any query point q , returns a point p within distance at most r from q . In this paper, we study the r -NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH) , the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. In this work, we show that LSH based algorithms can be made fair, without a significant loss in efficiency. We propose several efficient data structures for the exact and approximate variants of the fair NN problem. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. We also develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights the unfairness of state-of-the-art NN data structures and shows the performance of our algorithms on real-world datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sampling a Near Neighbor in High Dimensions — Who is the Fairest of Them All?

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Database Systems

Lead the way for us

Journal: ACM Transactions on Database Systems	Publication Date: Mar 31, 2022
Citations: 3

Similar Papers

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions
Martin Aumüller ... Francesco Silvestri
-
Martin Aumüller, et. al.Martin Aumüller ... Francesco Silvestri
14 Jun 2020
14 Jun 2020

Sampling near neighbors in search for fairness
Martin Aumüller ... Sariel Har-Peled
Communications of the ACM | VOL. 65
Martin Aumüller, et. al.Martin Aumüller ... Sariel Har-Peled
21 Jul 2022
Communications of the ACM | VOL. 65

Fair near neighbor search via sampling
Martin Aumuller ... Francesco Silvestri
ACM SIGMOD Record | VOL. 50
Martin Aumuller, et. al.Martin Aumuller ... Francesco Silvestri
15 Jun 2021
ACM SIGMOD Record | VOL. 50

Parameter-free locality sensitive hashing for spherical range reporting
...
-
, et. al. ...
04 Jan 2017
04 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sampling a Near Neighbor in High Dimensions — Who is the Fairest of Them All?

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Database Systems