Continuous sampling from distributed streams

Graham Cormode,Qin Zhang,S Muthukrishnan,Ke Yi

doi:10.1145/2160158.2160163

Abstract

A fundamental problem in data management is to draw and maintain a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The main challenge is to ensure that a sample is drawn uniformly across the union of the data while minimizing the communication needed to run the protocol on the evolving data. At the same time, it is also necessary to make the protocol lightweight, by keeping the space and time costs low for each participant. In this article, we present communication-efficient protocols for continuously maintaining a sample (both with and without replacement) from k distributed streams. These apply to the case when we want a sample from the full streams, and to the sliding window cases of only the W most recent elements, or arrivals within the last w time units. We show that our protocols are optimal (up to logarithmic factors), not just in terms of the communication used, but also the time and space costs for each participant.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Continuous sampling from distributed streams

Abstract

Talk to us

Similar Papers

More From: Journal of the ACM

Lead the way for us

Journal: Journal of the ACM	Publication Date: Apr 1, 2012
Citations: 101

Similar Papers

Optimal sampling from distributed streams
Graham Cormode ... S Muthukrishnan
-
Graham Cormode, et. al.Graham Cormode ... S Muthukrishnan
06 Jun 2010
06 Jun 2010

Efficient and scalable monitoring and summarization of large probabilistic data
Mingwang Tang
-
Mingwang TangMingwang Tang
22 Jun 2013
22 Jun 2013

Scalable histograms on large probabilistic data
Mingwang Tang ... Feifei Li
-
Mingwang Tang, et. al.Mingwang Tang ... Feifei Li
24 Aug 2014
24 Aug 2014

Qualitative data analysis with hypertext: A case of New York City crack dealers
Ali Manwar ... Eloise Dunlap
Qualitative Sociology | VOL. 17
Ali Manwar, et. al.Ali Manwar ... Eloise Dunlap
01 Sep 1994
Qualitative Sociology | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Continuous sampling from distributed streams

Abstract

Talk to us

Similar Papers

More From: Journal of the ACM