The Distributed Firing Squad Problem

Brian A Coan,Cynthia Dwork,Danny Dolev,Larry Stockmeyer

doi:10.1137/0218068

Abstract

The distributed firing squad problem is defined in the context of a synchronous distributed system where the correct processors operate in lock-step synchrony but do not share a global clock. If one or more correct processors receive a command to start a firing squad synchronization, then at some future time all correct processors must “fire” (formally, enter a special state) at exactly the same step. For various fault models, upper and lower bounds are proved on the number of faulty processors that can be tolerated and on the number of rounds of communication required between the reception of the start command and firing. For example, if a firing squad protocol is resilient to t fail-stop faults, then at least $t + 1$ rounds are necessary and sufficient. For the case of Byzantine faults with authentication where the faulty processors can take steps in between the synchronous steps of the correct processors, the firing squad problem can be solved in $t + 5$ rounds, provided that $n > 3t$, where n is the number of processors and t is the number of faults, and the problem cannot be solved at all if $n \leqq 3t$. Moreover, in the case that $n \leqq 3t$, the impossibility of a firing squad protocol holds even for a weaker “timing fault model” where all processors generate messages correctly according to the protocol, but the faulty processors can affect the system by slightly slowing down or speeding up messages.

Full Text