Abstract

Given a checkpoint and communication pattern of a distributed execution, the No Z-Cycle property (NZC) states that a dependency between a checkpoint and itself does not exist. In other words, a noncausal sequence of messages that starts after a checkpoint and terminates before that checkpoint does not exist. From an operational point of view, this property corresponds to the fact that each checkpoint belongs to at least one consistent global checkpoint. So it could be used, for example, for restarting a distributed application after the occurrence of a failure. In this paper we derive a characterization of the NZC property (previously an open problem). It identifies a subset of Z-cycles, namely core Z-cycles (CZCs), that has to be empty in order that the checkpoint and communication pattern of the execution satisfies the NZC property. Then, we present a communication-induced checkpointing protocol that prevents CZCs on-the-fly. This protocol actually removes the common causal part to any CZC. Finally we propose a taxonomy of communication-induced checkpointing protocols that ensure the NZC property.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call