Determining order relationship between events of a distributed computation is a fundamental problem in distributed systems which has applications in many areas including debugging, visualization, checkpointing and recovery. Fidge/Mattern’s vector-clock mechanism captures the order relationship using a vector of size N in a system consisting of N processes. As a result, it incurs message and space overhead of N integers. Many distributed applications use synchronous messages for communication. It is therefore natural to ask whether it is possible to reduce the timestamping overhead for such applications. In this paper, we present a new approach for timestamping messages and events of a synchronously ordered computation, that is, when processes communicate using synchronous messages. Our approach depends on decomposing edges in the communication topology into mutually disjoint edge groups such that each edge group either forms a star or a triangle. We show that, to accurately capture the order relationship between synchronous messages, it is sufficient to use one component per edge group in the vector instead of one component per process. Timestamps for events are only slightly bigger than timestamps for messages. Many common communication topologies such as ring, grid and hypercube can be decomposed into $${\lceil N/2 \rceil}$$ edge groups, resulting in almost 50% improvement in both space and communication overheads. We prove that the problem of computing an optimal edge decomposition of a communication topology is NP-complete in general. We also present a heuristic algorithm for computing an edge decomposition whose size is within a factor of two of the optimal. We prove that, in the worst case, it is not possible to timestamp messages of a synchronously ordered computation using a vector containing fewer than $${2 \lfloor N/6 \rfloor}$$ components when N ≥ 2. Finally, we show that messages in a synchronously ordered computation can always be timestamped in an offline manner using a vector of size at most $${\lfloor N/2 \rfloor}$$ .
Read full abstract