Near-Optimal Approximate Duplicate-Detection in Data Streams Over Sliding Windows for the Uniform Query Frequency or Membership Likelihood

Xiujun Wang ,Xiao Zheng ,Zhe Dang ,Xuangou Wu ,Baohua Zhao

doi:10.1109/.53

Abstract

Approximate duplicate-detection (or membership query) in data streams answers the question of whether an element from a large universe U (a query element) is present in a small subsequence of a data stream or not. It is an important query that has many Internet applications, such as web crawling, social networks and so on. Existing approximate duplicatedetection methods in the sliding window model are not memoryefficient, since that they don't incorporate the information on the query frequencies and membership likelihoods of the elements in a large universe U into their data structure design, while the information can be obtained with well-developed technique. In this paper, assuming that either the query frequency or membership likelihood is uniform for all elements in U, we adopt a block-wise updating strategy to design an memory-efficient data structure, called cell Bloom filter (CEBF), and an approximate duplicate-detection algorithm based on CEBF. Suppose that the average false positive rate is " and the sliding window size is n, then the number of bits used by our method is 2 log2(e)n(log2 1 "+ 1), which is much less than those of other existing algorithms. Experimental results on synthetic data verify the effectiveness of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Near-Optimal Approximate Duplicate-Detection in Data Streams Over Sliding Windows for the Uniform Query Frequency or Membership Likelihood

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improved Weighted Bloom Filter and Space Lower Bound Analysis of Algorithms for Approximated Membership Querying
Xiujun Wang ... Zhe Dang
-
Xiujun Wang, et. al.Xiujun Wang ... Zhe Dang
01 Jan 2015
01 Jan 2015

Weighted Bloom Filter
Jehoshua Bruck ... Anxiao Jiang
-
Jehoshua Bruck, et. al.Jehoshua Bruck ... Anxiao Jiang
01 Jul 2006
01 Jul 2006

Querying Regular Languages over Sliding Windows
...
-
, et. al. ...
01 Jan 2015
01 Jan 2015

Folded Bloom Filter for High Bandwidth Memory, with GPU Implementations
Masatoshi Hayashikawa ... Ryota Yasudo
-
Masatoshi Hayashikawa, et. al.Masatoshi Hayashikawa ... Ryota Yasudo
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Near-Optimal Approximate Duplicate-Detection in Data Streams Over Sliding Windows for the Uniform Query Frequency or Membership Likelihood

Abstract

Talk to us

Similar Papers