Correctness of a gossip based membership protocol

André Allavena,Alan Demers,John E. Hopcroft

doi:10.1145/1073814.1073871

Abstract

The importance of scalability and fault-tolerance in modern distributed systems has led to considerable research in multicast protocols using gossip. In a gossip protocol, each node forwards messages to a small set of “gossip partners” chosen at random from the entire group membership. By discarding the strong reliability guarantees of traditional protocols in favour of probabilistic guarantees, gossip protocols can deliver greater scalability and fault tolerance. In early gossip algorithms, partners were chosen uniformly at random from the entire membership, limiting scalability because of the resources required to store and maintain complete membership views at each node. Later protocols avoided this issue by storing much smaller random subsets of the membership at each node, and choosing gossip partners only from these local views. Such protocols are subtle: at least some local views must change in response to group membership changes in order to preserve connectivity and performance guarantees. While these protocols have been the subject of much simulation and analysis, formal proofs of key properties – in particular the probability of partitioning – have remained elusive. In this paper we give a new scalable gossip-based algorithm for local view maintenance, together with a proof that the expected time until a network partition is at least exponential in the square of the view size. We also develop probabilistic bounds on the in-degree (hence the load) of individual nodes, and argue that protocols lacking our reinforcement component eventually converge to star-like networks, whose connectivity depends on a small set of overloaded nodes. We also argue that the undirected connectivity graph is an expander, for which application-level gossip multi-cast protocols will converge rapidly. Our theoretical results are supported by simulations.

Full Text