Fast Consensus Using Bounded Staleness for Scalable Read-Mostly Synchronization

Haibo Chen,Haibing Guan,Heng Zhang,Ran Liu,Binyu Zang

doi:10.1109/tpds.2016.2539953

Abstract

Reader-mostly synchronization schemes, such as rwlocks and RCU, aim to maximize parallelism among readers, but many existing designs either cause readers to contend, or significantly extend writer latency, or both. This paper attributes such a problem to the lack of a fast consensus protocol between readers and writers, by which the two parts cooperate to obey the semantics of a synchronization construct. This paper describes FCP, a fast consensus protocol among readers and writers that provides scalable read-side performance as well as small writer latency for TSO architectures. The heart of FCP is a version-based consensus protocol between multiple non-communicating readers and a pending writer. FCP leverages bounded staleness of memory consistency to avoid atomic instructions and memory barriers in readers’ common paths, and uses message-passing (e.g., IPI) for straggling readers so that the writer latency can be bounded. To demonstrate the effectiveness of FCP, this paper applies FCP to construct a scalable reader-writers lock (rwlock) and a scalable RCU implementation. Evaluation on a 64-core machine shows that FCP significantly boosts the performance of the Linux virtual memory subsystem, a concurrent hashtable and an in-memory database. Micro-benchmarks show that FCP achieves smaller reader-side latency and lower writer-side latency when compared to state-of-the-art rwlocks and RCU implementation.

Full Text