A General-Purpose Counting Filter

Prashant Pandey,Rob Johnson,Michael A Bender,Rob Patro

doi:10.1145/3035918.3035963

Abstract

Approximate Membership Query (AMQ) data structures, such as the Bloom filter, quotient filter, and cuckoo filter, have found numerous applications in databases, storage systems, networks, computational biology, and other domains. However, many applications must work around limitations in the capabilities or performance of current AMQs, making these applications more complex and less performant. For example, many current AMQs cannot delete or count the number of occurrences of each input item, take up large amounts of space, are slow, cannot be resized or merged, or have poor locality of reference and hence perform poorly when stored on SSD or disk. This paper proposes a new general-purpose AMQ, the counting quotient filter (CQF). The CQF supports approximate membership testing and counting the occurrences of items in a data set. This general-purpose AMQ is small and fast, has good locality of reference, scales out of RAM to SSD, and supports deletions, counting (even on skewed data sets), resizing, merging, and highly concurrent access. The paper reports on the structure's performance on both manufactured and application-generated data sets. In our experiments, the CQF performs in-memory inserts and queries up to an order-of magnitude faster than the original quotient filter, several times faster than a Bloom filter, and similarly to the cuckoo filter, even though none of these other data structures support counting. On SSD, the CQF outperforms all structures by a factor of at least 2 because the CQF has good data locality. The CQF achieves these performance gains by restructuring the metadata bits of the quotient filter to obtain fast lookups at high load factors (i.e., even when the data structure is almost full). As a result, the CQF offers good lookup performance even up to a load factor of 95%. Counting is essentially free in the CQF in the sense that the structure is comparable or more space efficient even than non-counting data structures (e.g., Bloom, quotient, and cuckoo filters). The paper also shows how to speed up CQF operations by using new x86 bit-manipulation instructions introduced in Intel's Haswell line of processors. The restructured metadata transforms many quotient filter metadata operations into rank-and-select bit-vector operations. Thus, our efficient implementations of rank and select may be useful for other rank-and-select-based data structures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A General-Purpose Counting Filter

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Concurrent Expandable AMQs on the Basis of Quotient Filters
...
-
, et. al. ...
02 Jul 2020
02 Jul 2020

AniFilter
Hyungjun Oh ... Jiwon Seo
-
Hyungjun Oh, et. al.Hyungjun Oh ... Jiwon Seo
15 Apr 2020
15 Apr 2020

Don't thrash
Michael A Bender ... Rob Johnson
Proceedings of the VLDB Endowment | VOL. 5
Michael A Bender, et. al.Michael A Bender ... Rob Johnson
01 Jul 2012
Proceedings of the VLDB Endowment | VOL. 5

Quotient Filters: Approximate Membership Queries on the GPU
Afton Geil ... John D Owens
-
Afton Geil, et. al.Afton Geil ... John D Owens
13 Mar 2018
13 Mar 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A General-Purpose Counting Filter

Abstract

Talk to us

Similar Papers