On the Evolutionary of Bloom Filter False Positives - An Information Theoretical Approach to Optimizing Bloom Filter Parameters

Zhuochen Fan,Tong Yang,Bin Cui,Alex X. Liu,Qiaobin Fu,Gang Wen,Yang Zhou,Zhipeng Huang

doi:10.1109/tkde.2022.3200045

Abstract

The fundamental issue of how to calculate the false positive probability of widely used Bloom Filters (BF), from which the conventional wisdom is to derive the optimal value of <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula>, remains elusive. Since Bloom gave the false positive formula in 1970, in 2008, Bose <i>et al</i>. pointed out that Bloomˆs formula is flawed; and in 2010, Christensen <i>et al</i>. pointed out that Bose's formula is also flawed and gave another formula. Although Christensen's formula is perfectly accurate, it is time-consuming and impossible to calculate the optimal value of <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula>. Based on the following observation: for a BF with <inline-formula><tex-math notation="LaTeX">$m$</tex-math></inline-formula> bits and <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> elements, if and only if its entropy is the largest, its false positive probability is the smallest, we propose the first approach to calculating the optimal <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> without any false positive formula. Furthermore, we propose a new and more accurate upper bound for the false positive probability. When the size of a Bloom Filter becomes infinitely large, our upper bound turns equal to the lower bound, which becomes Bloomˆs formula and deepens our understanding towards it. Besides, we derive the bounds of correct rate of Counting Bloom Filters (CBFs) by applying our proposed formulas about BFs to them.

Full Text