Two Elementary Instructions Make Compare-and-Swap

Pankaj Khanchandani,Roger Wattenhofer

doi:10.1109/ipdps.2019.00046

Abstract

The consensus number of an object is the maximum number of processes among which binary consensus can be solved using any number of instances of the object and read-write registers. Herlihy [1] showed in his seminal work that if an object has a consensus number of n, then its instances can be used to implement any non-trivial object or data structure that is shared among n processes, so that the implementation is wait-free and linearizable. Thus, an object such as compare-and-set with an infinite consensus number is because its instances can be used to implement any non-trivial concurrent object shared among any number of processes. On the other hand, objects such as fetch-and-add or fetch-and-multiply have a consensus number of two and are elementary. An important consequence of Herlihy's result was that any number of reasonable elementary objects are provably insufficient to implement an advanced object like compare-and-set. However, Ellen et al. [2] observed recently that real multiprocessors do not compute using objects but using instructions that are applied on memory locations. Using this observation, they show that it is possible to use a couple of elementary instructions on the same memory location to implement an advanced one, and consequently any non-trivial object or data structure. However, the above result is only a possibility and uses a generic universal construction as a black-box, which is not how we implement objects in practice, as the generic construction is quite inefficient with respect to the number of steps taken by a process and the number of shared objects used in the worst case. Instead, the efficient implementations are built upon the widely supported compare-and-set instruction and one cannot conclude from the previous result whether the elementary instructions can also produce equally efficient implementations like compare-and-set does or they are fundamentally limited in this respect. In this paper, we answer this question by giving a wait-free and linearizable implementation of compare-and-set using just two elementary instructions, half-max and max-write. The implementation takes O(1) steps per process and uses O(1) shared objects per process. Thus, any known or unknown compare-and-set based implementation can also be done using only two elementary instructions without any loss in efficiency. An interesting aspect of these elementary instructions is that depending on the underlying system, their throughput in a highly concurrent setting is larger than that of the compare-and-set instructions by a factor proportional to n.

Full Text