SuperGuardian: Superspreader removal for cardinality estimation in data streaming

Jie Lu,Hongchang Chen,Penghao Sun,Tao Hu,Zhen Zhang,Quan Ren

doi:10.1016/j.is.2024.102351

Jie Lu, Hongchang Chen + Show 4 more

https://doi.org/10.1016/j.is.2024.102351

Copy DOI

Export

Save

Cite

Journal: Information Systems

Publication Date: Feb 17, 2024

Abstract
Full-Text
Similar Papers

Abstract

Listen

Measuring flow cardinality is one of the fundamental problems in data stream mining, where a data stream is modeled as a sequence of items from different flows and the cardinality of a flow is the number of distinct items in the flow. Many existing sketches based on estimator sharing have been proposed to deal with huge flows in data streams. However, these sketches suffer from inefficient memory usage due to allocating the same memory size for each estimator without considering the skewed cardinality distribution. To address this issue, we propose SuperGuardian to improve the memory efficiency of existing sketches. SuperGuardian intelligently separates flows with high-cardinality from the data stream, and keeps the information of these flows with the large estimator, while using existing sketches with small estimators to record low-cardinality flows. We carry out a mathematical analysis for the cardinality estimation error of SuperGuardian. To validate our proposal, we have implemented SuperGuardian and conducted experimental evaluations using real traffic traces. The experimental results show that existing sketches using SuperGuardian reduce error by 79 % - 96 % and increase the throughput by 0.3–2.3 times.

Full Text