Abstract

We give fault-tolerant algorithms for establishing synchrony in distributed systems in which each of the n nodes has its own clock. Our algorithms operate in a very strong fault model: we require self-stabilisation, i.e., the initial state of the system may be arbitrary, and there can be up to f < n /3 ongoing Byzantine faults, i.e., nodes that deviate from the protocol in an arbitrary manner. Furthermore, we assume that the local clocks of the nodes may progress at different speeds (clock drift) and communication has bounded delay. In this model, we study the pulse synchronisation problem, where the task is to guarantee that eventually all correct nodes generate well-separated local pulse events (i.e., unlabelled logical clock ticks) in a synchronised manner. Compared to prior work, we achieve exponential improvements in stabilisation time and the number of communicated bits, and give the first sublinear-time algorithm for the problem: • In the deterministic setting, the state-of-the-art solutions stabilise in time Θ ( f ) and have each node broadcast Θ( f log f ) bits per time unit. We exponentially reduce the number of bits broadcasted per time unit to Θ (log f ) while retaining the same stabilisation time. • In the randomised setting, the state-of-the-art solutions stabilise in time Θ( f ) and have each node broadcast O (1) bits per time unit. We exponentially reduce the stabilisation time to polylog f while each node broadcasts polylog f bits per time unit. These results are obtained by means of a recursive approach reducing the above task of self-stabilising pulse synchronisation in the bounded-delay model to non-self-stabilising binary consensus in the synchronous model. In general, our approach introduces at most logarithmic overheads in terms of stabilisation time and broadcasted bits over the underlying consensus routine.

Highlights

  • Many of the most fundamental problems in distributed computing relate to timing and fault tolerance

  • The vast majority of existing Very Large Scale Integrated (VLSI) circuits operate according to the synchronous paradigm: an internal clock signal is distributed throughout the chip neatly controlling alternation between computation and communication steps

  • Taking the uncertainty of unknown message delays and drifting clocks out of the equation leads to the so-called digital clock synchronisation problem [3, 14, 24, 26], where communication proceeds in synchronous rounds and the task is to agree on a consistent round counter

Read more

Summary

Introduction

Many of the most fundamental problems in distributed computing relate to timing and fault tolerance. Taking the uncertainty of unknown message delays and drifting clocks out of the equation leads to the so-called digital clock synchronisation problem [3, 14, 24, 26], where communication proceeds in synchronous rounds and the task is to agree on a consistent (bounded) round counter. While this abstraction is unrealistic as a basic system model, it yields conceptual insights into the pulse synchronisation problem in the bounded-delay model. It is useful to assign numbers to pulses after pulse synchronisation is solved, in order to get a fully-fledged shared system-wide clock [25]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call