Abstract
This paper concerns the problem of finding persistent items in distributed datasets, which has many applications such as port scanning and intrusion detection. To the best of our knowledge, there is no existing solution for finding persistent items in distributed datasets. In this paper, we propose DISPERSE, a probabilistic algorithm that can find persistent items in distributed datasets without collecting all the datasets. Our basic idea is that each monitor compresses each item ID in its dataset in a lossy fashion, sends the set of lossily compressed item IDs to the server, then the server recovers the IDs of the persistent items. We design the lossy compression so that given one lossily compressed item ID, the server cannot recover the item ID, but when the number of lossily compressed versions of the same item ID exceeds a threshold, which means that such items are persistent ones, the server can recover the item ID with a high probability. This threshold is exactly the threshold in the definition of persistent items. We implemented DISPERSE and evaluated its performance. In comparison with the straightforward solution, DISPERSE achieves a compression ratio of 26.5% with FNR=3.5% and FPR=0. In comparison with a developed Bloom filter based scheme and the adapted kBF and IBF schemes, our scheme can achieve 7.9, 5.7, and 6.6 times performance gains, respectively, in terms of compression ratio.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.