Abstract

In this paper, we consider the problem of thresholded monitoring in distributed data streams, that is, given multiple distributed data streams observed by multiple monitors during a certain period, finding the items whose global frequencies over all data streams exceeding a given threshold. We first derive a lower bound of communication overhead for any deterministic algorithm for this problem. Then, we propose two different schemas, i.e. , Low-threshold Cascaded Cuckoo Filter (L-CCF) for low-threshold monitoring and High-threshold Cascaded Cuckoo Filter (H-CCF) for high-threshold monitoring. L-CCF and H-CCF can identify items whose frequencies are more than the given threshold while a desired false negative rate (FNR) is achieved and communication overhead is optimized. The key idea is to compress the communication overhead caused by transferring the ID and frequency information at the same time. First, to reduce the communication overhead of transferring IDs, we propose to encode the IDs into separate tiny parts and store these tiny parts in L-CCF or H-CCF. Second, to reduce the communication overhead of transferring frequencies, we adopt a carry-in counter technique in L-CCF and multiple sampling technique in H-CCF. We evaluated L-CCF and H-CCF on two real-world traces and compared their performance with two prior adapted algorithms. Our experimental results show that on average, L-CCF and H-CCF achieve FNRs with 55% and 65% better than that of comparison algorithms while FPRs is maintained at the level of 2%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call