A Bloom Filter for High Dimensional Vectors

Chunyan Shuai,Hengcheng Yang,Zeweiyi Gong,Xin Ouyang

doi:10.3390/info9070159

Abstract

Regardless of the type of data, traditional Bloom filters treat each element of a set as a string, and by iterating every character of the string, they discretize all data randomly and uniformly. However, with the data size and dimension increases, these variants are inefficient. To better discretize vectors with high numerical dimensions, this paper improves the string hashes to integer hashes. Based on the integer hashes and a counter array, we propose a new variant—high-dimensional bloom filter (HDBF)—to extend the Bloom filter into high-dimensional spaces, which can represent and query numerical vectors of a big set with a low false positive probability. This paper theoretically analyzes the feasibility of the integer hashes on discretizing data and discusses the relationship of parameters of the HDBF. The experiments illustrate that, in high-dimensional numerical spaces, the HDBF shows better randomness on distribution and entropy than that of the counting Bloom filter. Compared with the parallel Bloom filters, for a fixed false positive probability, the HDBF displays time-space overheads, and is more suitable to deal with the numerical vectors with high dimensions.

Highlights

In high-dimensional spaces, exact search methods, such as kd-tree approaches [1] and Q-gram [2], are only suitable for small size vectors due to the very large computational resources
Since the distribution and entropy reflect the discrete state of data, to check whether high-dimensional Bloom filter (HDBF) can scatter the high-dimensional vectors into different integers, randomly and uniformly, this paper firstly compares the distribution and entropy of HDBF with Counting BF (CBF) on 3 datasets
Under 10 K query vectors, time10of CBF and HDBF are less than all dimensions into corresponding arrays, the initiation time will continue to increase with the parallel CBF (PBF)-HT and PBF-Bloom filter (BF), as shown in Figures 9 and 10

Summary

Introduction

In high-dimensional spaces, exact search methods, such as kd-tree approaches [1] and Q-gram [2], are only suitable for small size vectors due to the very large computational resources. The integer-granularity locality-sensitive Bloom filter (ILBF) [32] filters objects with multiple integer distance granularities to shrink the distances and to reduce the FPP in MLBF All these schemes are based on the LSH, according to the central limited theorem, after mapping, the LSH shrinks most of elements of the set around the mean, which results in a high FPP in member query, especially around the Information 2017, 8, x mean. The modified hash functions can effectively discretize with numerical high dimensions, The experiments demonstrate that the HDBF has thevectors same performances as CBF as regards data discretization, which can efficiently deal with the vectors in high numerical dimensional spaces.filters uniformly and randomly. 3. HDBF outperforms CBF in false positive probability, query delay, memory costs, and especially in numerical high-dimensional spaces

Work Mechanism and Structure

Performances

Dataset and Settings

Distribution and Entropy

Entropies of of and for

Memory Costs and Latency

Compared with andare

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Bloom Filter for High Dimensional Vectors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Journal: Information	Publication Date: Jul 2, 2018
License type: CC BY 4.0

Similar Papers

Using Parallel Bloom Filters for Multiattribute Representation on Network Services
Bin Xiao ... Yu Hua
IEEE Transactions on Parallel and Distributed Systems | VOL. 21
Bin Xiao, et. al.Bin Xiao ... Yu Hua
01 Jan 2009
IEEE Transactions on Parallel and Distributed Systems | VOL. 21

Distributed SCFMBF Based Protocol for Integrity In Cloud Storage System (DPICS)
Hananeh Sasaniyan Asl ... Behzad Mozaffari Tazehkand
SN Applied Sciences | VOL. 2
Hananeh Sasaniyan Asl, et. al.Hananeh Sasaniyan Asl ... Behzad Mozaffari Tazehkand
14 Jan 2020
SN Applied Sciences | VOL. 2

Suitability of a new Bloom filter for numerical vectors with high dimensions.
Chunyan Shuai ... Xin Ouyang
PloS one | VOL. 13
Chunyan Shuai, et. al.Chunyan Shuai ... Xin Ouyang
21 Dec 2018
PloS one | VOL. 13

A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
Chunyan Shuai ... Siqi Li
Computational Intelligence and Neuroscience | VOL. 2016
Chunyan Shuai, et. al.Chunyan Shuai ... Siqi Li
01 Jan 2015
Computational Intelligence and Neuroscience | VOL. 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Bloom Filter for High Dimensional Vectors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information