Abstract

Real-time data stream processing is key to many Internet applications ranging from e-commerce, social networks, to network monitoring. Sketches (compact data structures) are often used to track large, high-rate streams and estimate their statistics. However, the traditional model for measuring flow size, e.g., number of packets in each flow, is unsuitable for more sophisticated applications that require flow spreads, i.e., number of distinct elements in each flow. When there are numerous flows, most existing work modifies CountMin sketches — which were designed to measure flow size — for measuring flow spread. They inherit a similar design and a commonly-used min operation from the CountMin sketches. This paper casts doubt on such a solution path, arguing that the inherited sketch design and the min operation were both intended for a different purpose and are not the best for flow spread measurement. Replacing the min operation with new position-aware operations, we propose the kJoin sketches that achieve much better average accuracy. We mathematically derive the estimation error bound. We also apply our sketches for network-wide measurement. Trace-driven experiments on software/GPU/FPGA platforms demonstrate our work significantly improves estimation accuracy, streaming rate (recording throughput), and query throughput, compared to the state-of-the-art.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call