Abstract

This paper is on designing a compact data structure for multi-set membership testing that allows fast set querying. Multi-set membership testing is a fundamental operation for computing systems. Most existing schemes for multi-set membership testing are built upon Bloom filter and fall short in either storage space cost or query speed. To address this issue, we propose Noisy Bloom Filter (NBF), Error Corrected Noisy Bloom Filter (NBF-E), and Data-driven Noisy Bloom Filter (NBF-D) in this paper. We optimize their misclassification and false positive rates by theoretical analysis and present criteria for selection between NBF, NBF-E, and NBF-D. The key novelty of the three schemes is to store set ID information in a compact but noisy way that allows fast recording and querying and use a denoising method for querying. Especially, NBF-E incorporates asymmetric error-correcting coding techniques into NBF, and NBF-D encodes set ID based on their cardinality. To evaluate NBF, NBF-E, and NBF-D in comparison with the prior art, we conducted experiments using real-world network traces. The results show that NBF, NBF-E, and NBF-D significantly advance the state-of-the-art on multi-set membership testing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call