Abstract
Building consensus sequences based on distributed, fault-tolerant consensus, as used for replicated state machines, typically requires a separate distributed state for every new consensus instance. Allocating and maintaining this state causes significant overhead. In particular, freeing the distributed, outdated states in a fault-tolerant way is not trivial and adds further complexity and cost to the system. In this paper, we propose an extension to the single-decree Paxos protocol that can learn a sequence of consensus decisions 'in-place', i.e. with a single set of distributed states. Our protocol does not require dynamic log structures and hence has no need for distributed log pruning, snapshotting, compaction, or dynamic resource allocation. The protocol builds a fault-tolerant atomic register that supports arbitrary read-modify-write operations. We use the concept of consistent quorums to detect whether the previous consensus still needs to be consolidated or is already finished so that the next consensus value can be safely proposed. Reading a consolidated consensus is done without state modifications and is thereby free of concurrency control and demand for serialisation. A proposer that is not interrupted reaches agreement on consecutive consensus decisions within a single message round-trip per decision by preparing the acceptors eagerly with the previous request.
Highlights
STATE machine replication [1] is a common technique for implementing distributed, fault-tolerant services
Replicated state machine (RSM) implementations are centred around the use of a consensus protocol, as replicas must sequentially apply the same commands in the same order to prevent divergence
Before presenting RMWPaxos, we introduce the notion of a consensus sequence register, an obstruction-free multiwriter, multi-reader register that performs any submitted write operation at-least-once
Summary
STATE machine replication [1] is a common technique for implementing distributed, fault-tolerant services. Replicated state machine (RSM) implementations are centred around the use of a consensus protocol, as replicas must sequentially apply the same commands in the same order to prevent divergence. A new command is processed by applying it to the current state and proposing the result as the value in a sequence of consensus decisions. SKRZYPCZAK ET AL.: RMWPAXOS: FAULT-TOLERANT IN-PLACE CONSENSUS SEQUENCES results in RMWPaxos—a fault-tolerant general atomic read-modify-write (RMW) register. A consistent quorum indicates the most recent consensus decision This makes it possible to propose a follow-up value in-place, i.e., without a command log or multiple independent consensus instances (Section 5.4). We show that by using ordered links, exactly-once semantics can be achieved to build an atomic RMW register, called RMWPaxos (Section 5.5)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have