Trading Fences with RMRs and Separating Memory Models

Hagit Attiya,Danny Hendler,Philipp Woelfel

doi:10.1145/2767386.2767427

Abstract

Out-of-order execution of instructions is a common optimization technique for multicores and multiprocessors, which is governed by the memory model of the architecture. Relatively strong memory models, like TSO (supported by x86 and AMD), only allow reads to bypass earlier writes, while other models, like RMO (supported by ARM, POWER and Alpha) and PSO (supported by older SPARC), also allow the reordering of writes to different locations. These reorderings can be prevented by the use of costly fence instructions.In this paper we prove that when writes can be reordered (e.g, in RMO or even PSO), there is a tradeoff between the number of fences, f, and the number of remote memory references (RMRs), r, for a large class of objects, including locks, counters and queues: [f (log r/f + 1) ∈ Ω(log n) .] For example, when one of these objects is implemented using a constant number of fences (e.g., in the Bakery lock), the tradeoff implies that a linear number of RMRs is required (as indeed is the case with the Bakery lock). This gives a complexity separation between the memory models that allow write reordering and those that prohibit it, since a recent paper shows that a lock for CRC: (and related objects) can be implemented in the stronger TSO memory model, with a small, constant number of fences, and a logarithmic number of RMRs.The lower bound uses an information theoretic argument, relating the encoding of n! distinguishable executions to the number of fences and RMRs performed in the course of these executions.We also present a family of algorithms matching the lower bound, which explicitly enforce the required ordering, and hence, are correct even with weak memory models. This shows that the tradeoff is tight, and indicates that for many important objects, fences are mostly needed for avoiding reordering of writes.

Full Text