Abstract
Hash table, a widely used data structure, can achieve an O(1) average lookup speed at the cost of large memory usage. Unfortunately, hash tables suffer from collisions and the rate of collisions is largely determined by the load factor. Broadly speaking, existing research has taken two approaches to improve the performance of hash tables. The first approach trades-off collision rate with memory usage, but only works well under low load. The second approach pursues high load and no hash collisions, but comes with update failures. The goal of this paper is to design a practical and efficient hash table that achieves high load factor, low hash collision rate, fast lookup speed, fast update speed, and zero update failures. To achieve this goal, we take a three-step approach. First, we propose a set of hashing techniques that leverage Bloom filters to significantly reduce hash collision rates. Second, we introduce a novel kick mechanism to achieve a high load factor. Last, we develop bitmaps to significantly accelerate the kick mechanism. Theoretical analysis and experimental results show that our hashing schemes significantly outperform the state-of-the-art Our hash table achieves a high load factor (greater than 95%), a low collision rate (less than 0.56%), and the number of hash buckets almost equals to the number of key-value pairs. Given n key-value pairs, the collision rate is reduced to 0 by either using 1.18 ×n buckets or allowing up to 5 blind kicks. We have released the source code of the implementations of our hash table and of 6 prior hash tables at Github [1].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.