Approximate Hash-Based Set Reconciliation for Distributed Replica Repair

Nico Kruber,Maik Lange,Florian Schintke

doi:10.1109/srds.2015.30

Abstract

The objective comparison of hash-based set reconciliation algorithms for distributed replica repair is challenging. Each algorithm's behaviour can be tuned for a given use case, e.g. low bandwidth or computational overhead, using different sets of parameters. Changes on these parameters, however, often also influence the algorithm's accuracy in recognising differences between replicas and thus hinder objective comparisons. We develop models to deduce parameters for equally accurate set reconciliation algorithms for replica repair in a distributed system and compare equally accurate instances of two trivial hash-based algorithms, an algorithm using Bloom filters and a Merkle tree based algorithm. Instead of using a large fixed hash size for Merkle trees we propose to use dynamic hash sizes to align the transfer overhead with the desired accuracy. We evaluate (a) the transferred volume of data with respect to different entropy levels, data and failure distributions on the set of items, and (b) the scalability in the number of items. Our results allow to easily choose an efficient algorithm for practical set reconciliation tasks based on the required level of accuracy. Our way to find equally accurate configuration parameters for different algorithms can also be adopted to other set reconciliation algorithms and allows to rate their respective performance in an objective manner.

Full Text