Abstract

The fundamental issue of how to calculate the false positive probability of widely used Bloom Filters (BF), from which the conventional wisdom is to derive the optimal value of <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula>, remains elusive. Since Bloom gave the false positive formula in 1970, in 2008, Bose <i>et al</i>. pointed out that Bloom&#x02C6;s formula is flawed; and in 2010, Christensen <i>et al</i>. pointed out that Bose&#x0027;s formula is also flawed and gave another formula. Although Christensen&#x0027;s formula is perfectly accurate, it is time-consuming and impossible to calculate the optimal value of <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula>. Based on the following observation: for a BF with <inline-formula><tex-math notation="LaTeX">$m$</tex-math></inline-formula> bits and <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> elements, if and only if its entropy is the largest, its false positive probability is the smallest, we propose the first approach to calculating the optimal <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> without any false positive formula. Furthermore, we propose a new and more accurate upper bound for the false positive probability. When the size of a Bloom Filter becomes infinitely large, our upper bound turns equal to the lower bound, which becomes Bloom&#x02C6;s formula and deepens our understanding towards it. Besides, we derive the bounds of correct rate of Counting Bloom Filters (CBFs) by applying our proposed formulas about BFs to them.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call