ABSTRACTIntroductionDistributed data processing and storage systems require efficient methods to distribute keys across buckets. While simple and fast, the traditional modulo‐based mapping is unstable when the number of buckets changes, leading to spikes in system resource utilization, such as network or database requests. Consistent hash algorithms minimize remappings but are either significantly slower, require floating‐point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This work introduces JumpBackHash, a consistent hash algorithm that overcomes those shortcomings.MethodologyJumpBackHash applies the concept of active indices borrowed from consistent weighted sampling, which inherently leads to consistency. It generates the active indices in reverse order, which avoids floating‐point operations, enables the minimization of consumed random values and the use of a standard pseudorandom generator, and finally leads to a very efficient algorithm.ResultsTheoretical analysis shows that JumpBackHash has an expected constant runtime. The expected value and the variance of the number of consumed random values perfectly agree with the experiments. Empirical tests also confirm the consistency.ConclusionJumpBackHash offers a fast and efficient solution for uniformly distributing keys across buckets in distributed systems. Its simplicity, performance, and the availability of a production‐ready Java implementation as part of the Hash4j open source library make it a viable replacement for the modulo‐based approach for improving assignment and system stability.
Read full abstract