SF-Sketch: A Two-Stage Sketch for Data Streams

Lingtong Liu,Yibo Yan,Gaogang Xie,Tong Yang,Yulong Shen,Muhammad Shahzad,Bin Cui

doi:10.1109/tpds.2020.2987609

Abstract

Sketches are probabilistic data structures designed for recording frequencies of items in a multi-set. They are widely used in various fields, especially for gathering Internet statistics from distributed data streams in network measurements. In a distributed streaming application with high data rates, a sketch in each monitoring node “fills up” very quickly and then its content is transferred to a remote collector responsible for answering queries. Thus, the size of the contents transferred must be kept as small as possible while meeting the desired accuracy requirement. To obtain significantly higher accuracy while keeping the same update and query speed as the best prior sketches, in this article, we propose a new sketch - the Slim-Fat (SF) sketch. The key idea behind the SF-sketch is to maintain two separate sketches: a larger sketch, the Fat-subsketch, and a smaller sketch, the Slim-subsketch. The Fat-subsketch is used for updating and periodically producing the Slim-subsketch, which is then transferred to the remote collector for answering queries quickly and accurately. We also present the error bound as well as an accurate model of the correct rate of the SF-sketch, and verify their correctness through experiments. We implemented and extensively evaluated the SF-sketch along with several prior sketches. Our results show that when the size of our Slim-subsketch and of the widely used Count-Min (CM) sketch are kept the same, our SF-sketch outperforms the CM-sketch by up to 33.1 times in terms of accuracy (when the ratio of the sizes of the Fat-subsketch and the Slim-subsketch is 16:1). We have made all source codes publicly available at Github [“Source code of SF sketches,” [Online]. Available: https://github.com/paper2017/SF-sketch].

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Oct 1, 2020
Citations: 14	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

SF-Sketch: A Two-Stage Sketch for Data Streams

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Similar Papers

Accurate per-flow measurement with bloom sketch
Yang Zhou ... Haowei Zhang
-
Yang Zhou, et. al.Yang Zhou ... Haowei Zhang
01 Apr 2018
01 Apr 2018

Dynamic Count-Min Sketch for Analytical Queries Over Continuous Data Streams
Xiaobo Zhu ... Shupeng Wang
-
Xiaobo Zhu, et. al.Xiaobo Zhu ... Shupeng Wang
01 Dec 2018
01 Dec 2018

The Key Technologies for Classification of Distributed Data Streams
Hua Fen Xu ... Jing Wu
Applied Mechanics and Materials | VOL. 727-728
Hua Fen Xu, et. al.Hua Fen Xu ... Jing Wu
01 Jan 2015
Applied Mechanics and Materials | VOL. 727-728

A Robust Approach to Find Effective Items in Distributed Data Streams
Xiaoxia Rong ... Jindong Wang
-
Xiaoxia Rong, et. al.Xiaoxia Rong ... Jindong Wang
14 Sep 2007
14 Sep 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SF-Sketch: A Two-Stage Sketch for Data Streams

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems