Single Hash: Use One Hash Function to Build Faster Hash Based Data Structures

Xiangyang Gou,Yang Zhou,Tong Yang,Lei Zou,Xiaoming Li,Chenxingyu Zhao,Bin Cui,Yibo Yan

doi:10.1109/bigcomp.2018.00048

Abstract

With the scale of data to store or monitor in nowadays network constantly increasing, hash based data structures are more and more widely used because of their high memory et1iciency and high speed. Most of them, like Bloom filters, sketchesand d-Iefthash tables use more than one hash function. Furthermore, in order to achieve good randomicity, the hash functions used, like MD5 and SHA1, are very complicated and consumea lot of CPU cyclesto carry out. As a consequence, the implementation of these hash functions will be time-consuming, In order to address this issue, wepropose SingleHash technique in this paper. It is based on the observation that the hash functions we use produce 32-bit or M-bit values which have much bigger value ranges than that we need in practice. We usually have to carry out modular operation to map the hash results into a smaller range in the data structures listed above. In this procedure, information carried by the high bits may be discarded. For example, if in a Bloom filter the length of the bit array is 220 while the hash functions we use are 32-bit hash functions, there are 12 bits in the results of the hash functions discarded in the procedure of modular. We can use these bits to produce more hash values. Therefore, we propose to use a few bit operations to make full use of the information produced by one hash function and generate multiple hash values which can be used in these data structures. SingleHash technique can be applied to most of the hash based data structures. It can significantly improve their speed, because instead of carrying out multiple hash functions, we only need to compute one hash function and a few simple operations (e.g., bit shift and XOR). Other aspects of performance, likememoryefficiency and accuracy of these data structures willnot be influenced by Single Hash technique. In this paper, weapply it to three kinds of classic hash based data structures, i.e., Bloomfilters, CM sketches and d-Iefthash tables as case studies, and evaluate their performance with both mathematical analysis and extensive experiments. We make all our codes open source on Github.

Full Text