Abstract

Set synchronization is an essential job for distributed applications. In many cases, given two sets $A$ and $B$ , applications need to identify those elements that appear in set $A$ but not in set $B$ , and vice versa. Bloom filter, a space-efficient data structure for representing a set and supporting membership queries, has been employed as a lightweight method to realize set synchronization with a low false positive probability. Unfortunately, bloom filters and their variants can only be applied to simple sets rather than more general multisets, which allow elements to appear multiple times. In this paper, we first examine the potential of addressing the multiset synchronization problem based on two existing variants of the bloom filters: the IBF and the counting bloom filter (CBF). We then design a novel data structure, invertible CBF (ICBF), which represents a multiset using a vector of cells. Each cell contains two fields, $id$ and $count$ , which record the identifiers and number of elements mapped into them, respectively. Given two multisets, based on the encoding results, the ICBF can execute the dedicated subtracting and decoding operations to recognize the different elements and differences in the multiplicities of elements between the two multisets. We conduct comprehensive experiments to evaluate and compare the three dedicated multiset synchronization approaches proposed in this paper. The evaluation results indicate that the ICBF-based approach outperforms the other two approaches in terms of synchronization accuracy, time-consumption, and communication overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call