Abstract
The mission of set reconciliation (also called set synchronization) is to identify those elements which appear only in exactly one of two given sets. In this paper, we extend the set reconciliation problem into three design rationales: (i) multiset support; (ii) near 100 percent reconciliation accuracy; and (iii) communication-friendly and time-saving. These three rationales, if realized, will lead to unprecedented benefits for the set reconciliation paradigm. Generally, prior reconciliation methods are mainly designed for simple sets and thus remain inapplicable for multisets. Methods based on probabilistic data structures, e.g., the Counting Bloom Filter (CBF), support efficient representation, and multiplicity queries. Based on these probabilistic data structures, approximate multiset reconciliation can be enabled. However, they often cannot achieve a statisfying accuracy, due to potential hash collisions. The reconciliations enabled by logs or lists incur high time-complexity and communication overhead. Therefore, existing reconciliation methods, fail to realize the three rationales simultaneously. To this end, we redesign Trie and Fenwick Tree (FT), to near-accurately represent and reconcile two types of multisets that we refer to as unsorted and sorted multisets, respectively. Moreover, to further reduce the communication overhead during the reconciliation process, we design a partial transmission strategy when exchanging two Tries or FTs. Comprehensive evaluations are conducted to quantify the performance of our proposals. The trace-driven evaluations demonstrate that Trie and FT achieve near-accurate multiset reconciliation, with 4.31 and 2.96 times faster than the CBF-based method, respectively. The simulations based on synthetic datasets further indicate that our proposals outperform the CBF-based method in terms of accuracy and communication overhead at most time.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.