Abstract

Sketch as a probability data structure has been widely used in high volume, fast data streams. At the cost of a tiny accuracy in frequency estimation, it achieves a high speed with small memory usage. However, skewed data streams pose a significant challenge for existing sketches in terms of accuracy and speed using limited memory. To address this issue, we proposed a framework, called HeavySeparation, to enhance existing sketches by filtering elephant flow efficiently and accurately. We adopt a power-weakening increment strategy to allow sufficient competition in the early stages of identifying elephant flows and amplifying relative advantage when the frequency of candidate flow is large. To verify the effectiveness and efficiency of our framework, we apply the framework to two typical sketches and two common stream processing tasks. Results show that HeavySeparation framework reduces the error by around 1–2 orders of magnitude on average compared to the state-of-the-art in frequency estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call