In this paper we formally study the guarantees provided by highly available, eventually consistent replicated systems in an environment in which machine failures and network splits are possible. Our analysis accounts for possible replica recovery after a crash, and clients that are (1) stateless or stateful, (2) sticky (always connect to a concrete set of replicas) or mobile, and (3) which can timeout before receiving a response to the sent request. We show why the approaches to prove protocol correctness prevalent in the literature, which do not take into account replica or network crashes, may lead to incorrect conclusions regarding the guarantees offered by the protocol. We adapt the existing formal correctness criteria, such as basic eventual consistency, to the considered environment by defining the family of failure-aware consistency guarantees. We formally identify a set of undesired phenomena (in particular phantom operations) observed by the clients, which, as we prove, are unavoidable in highly available systems in which unrecoverable replica crashes are possible. We also introduce context preservation, a new client-side requirement for eventually consistent systems that expose concurrency to the client, i.e., allow clients to use, e.g., multi-value registers or observed-remove sets. Context preservation is incomparable with classic session guarantees.
Read full abstract