Abstract

Efficient set representation and membership testing are important in various big data applications. The state-of-the-art Cuckoo filter design shows great advantages in both query efficiency and the support of item deletion, compared to previous Bloom filter and its variants. However, in this work, we show mathematically and experimentally that Cuckoo filter may suffer serious performance degradation during element insertion because of its random choice strategy for inserting an element into the candidate buckets. Such a random choice strategy incurs load imbalance among different buckets in Cuckoo filter and can lead to frequent relocations and the consequent long time for inserting an item. To solve this problem, we propose a novel design which leverages the principle of the power of two choices to select the better candidate bucket during inserting an element. Our design balances the load distribution among buckets in Cuckoo filter and avoids a large amount of relocations during insertion. We implement our design and apply it in real-world applications. We conduct comprehensive experiments using large-scale data sets collected from real world systems to evaluate the performance of this design. The results show that our design significantly reduces the average number of relocations of Cuckoo filter by 35%, as well as reducing item inserting latency by 25%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call