On Frequency Estimation and Detection of Heavy Hitters in Data Streams

Federica Ventruto,Marco Pulimeno,Massimo Cafaro,Italo Epicoco

doi:10.3390/fi12090158

Abstract

A stream can be thought of as a very large set of data, sometimes even infinite, which arrives sequentially and must be processed without the possibility of being stored. In fact, the memory available to the algorithm is limited and it is not possible to store the whole stream of data which is instead scanned upon arrival and summarized through a succinct data structure in order to maintain only the information of interest. Two of the main tasks related to data stream processing are frequency estimation and heavy hitter detection. The frequency estimation problem requires estimating the frequency of each item, that is the number of times or the weight with which each appears in the stream, while heavy hitter detection means the detection of all those items with a frequency higher than a fixed threshold. In this work we design and analyze ACMSS, an algorithm for frequency estimation and heavy hitter detection, and compare it against the state of the art ASketch algorithm. We show that, given the same budgeted amount of memory, for the task of frequency estimation our algorithm outperforms ASketch with regard to accuracy. Furthermore, we show that, under the assumptions stated by its authors, ASketch may not be able to report all of the heavy hitters whilst ACMSS will provide with high probability the full list of heavy hitters.

Highlights

In the data stream model, data arrives or can be accessed only sequentially and in a given order; no random access to the data is allowed
In this paper we are concerned with the problems of frequency estimation and frequent item detection in data streams
We compare our algorithm to ASKETCH [10], the state of the art algorithm for these problems and, through extensive experimental results, we show that ACMSS achieves better accuracy than ASKETCH for the problem of frequency estimation

Summary

Introduction

In the data stream model, data arrives or can be accessed only sequentially and in a given order; no random access to the data is allowed. Queries about the data stream are answered using that summary, and the time for processing an item and computing the answer to a given query is limited. Two of the most important and well studied problems in the field of Data Mining are frequency estimation of data stream items and the detection of heavy hitters, known as frequent items. In this paper we are concerned with the problems of frequency estimation and frequent item detection in data streams. We compare our algorithm to ASKETCH [10], the state of the art algorithm for these problems and, through extensive experimental results, we show that ACMSS achieves better accuracy than ASKETCH for the problem of frequency estimation.

Preliminary Definitions

Related Work

The ASKETCH Algorithm

The ACMSS Algorithm

Ws W wW

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Internet	Publication Date: Sep 18, 2020
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

On Frequency Estimation and Detection of Heavy Hitters in Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Similar Papers

Parallel streaming frequency-based aggregates
Kanat Tangwongsan ... Srikanta Tirthapura
-
Kanat Tangwongsan, et. al.Kanat Tangwongsan ... Srikanta Tirthapura
21 Jun 2014
21 Jun 2014

Heavy Hitter Detection and Identification in Software Defined Networking
Liang Yang ... Bryan Ng
-
Liang Yang, et. al.Liang Yang ... Bryan Ng
01 Aug 2016
01 Aug 2016

The code-aided FEPE algorithm for joint frequency and phase estimation at Low SNR
Johannes Ebert ... Wilfried Gappmair
-
Johannes Ebert, et. al.Johannes Ebert ... Wilfried Gappmair
01 Sep 2012
01 Sep 2012

Group testing under sum observations for heavy hitter detection
Chao Wang ... Chen-Nee Chuah
-
Chao Wang, et. al.Chao Wang ... Chen-Nee Chuah
01 Feb 2015
01 Feb 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Frequency Estimation and Detection of Heavy Hitters in Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet