Abstract
In many computing applications, there is a need to estimate different features of large data sets, such as the number of distinct elements or the similarity with other data sets. This can be efficiently implemented using probabilistic data structures known as sketches; for example, Hyperloglog is widely used for cardinality estimate and Minhash for similarity estimate. Many of these sketches rely on computing hash functions on the elements and counting the number of consecutive leading zeros (or ones) to identify the minimum (or maximum) hash value for data updating. This functionality usually accounts for the most demanding part of the sketch implementation; an efficient and low power hash computation and leading zero counting are therefore required for high performance. A novel but important observation is exploited in this paper; namely the trailing hash bits are rarely used in these sketches (e.g., the leading zeros counting stops once the first one is found, which usually occurs after a few bits). Based on this feature, a so-called Two-Step Hashing (TSH) scheme is proposed in this paper to significantly reduce the power consumption. Therefore rather than computing all hash bits (as in the traditional scheme), TSH computes a few leading hash bits in a first step and the trailing hash bits in the second step, that is only required when the leading bits are all zeros (or ones). Hence, in most cases only a fraction of the hash bits is computed, so significantly reducing the power dissipation of hash computations when they are used to check the number of consecutive leading zeros, or the minimum value. The use of TSH for sketches with both single and multiple pipeline implementations are considered; evaluation results based on HyperLogLog (HLL) as a sketch with a Cyclic Redundancy Check hash function show that TSH saves up to 75.9 % (78.7 %) power dissipation in a single (multiple) pipeline implementation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.