Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics

Alberto Ros,Stefanos Kaxiras,Christos Sakalis,Carl Leonardsson

doi:10.1109/tpds.2017.2720744

Abstract

Cache coherence protocols based on self-invalidation allow simpler hardware implementation compared to traditional write-invalidation protocols, by relying on data-race-free semantics and applying self-invalidation on synchronization points. Their simplicity lies in the absence of invalidation traffic. This eliminates the need to track readers in a directory, and reduces the number of transient protocol states. Similarly, the use of self-downgrade on synchronization eliminates directory indirection, and hence the need to track writers in a directory. These protocols, effectively without a directory, have the potential to reduce area, energy consumption, and complexity, without sacrificing performance— provided , that self-invalidation and self-downgrade are performed prudently. In this work we examine how self-invalidation and self-downgrade are performed in relation to atomicity and ordering. We show that self-invalidation and self-downgrade do not need to be applied conservatively, as so far implemented. Our key observation is that, often, critical sections which are not ordered in time, are intended to provide only atomicity and not thread synchronization. We thus propose a new type of self-invalidation, forward self-invalidation (FSI), which invalidates solely data that are going to be accessed inside a critical section. Based on the same reasoning, we propose a new type of self-downgrade, forward self-downgrade (FSD), also restricted to writes in critical sections. Finally, we define the semantics of locks using FSI and FSD, which resemble the semantics of relaxed atomic operations in C++. Our evaluation for 64-core multiprocessors shows significant improvements using the proposed FSI and FSD—where applicable—in Splash-3 and PARSEC benchmarks, over a directory-based protocol (17.1 percent in execution time and 33.9 percent in energy consumption) and also over a state-of-the-art self-invalidation/self-downgrade protocol (7.6 percent in execution time and 9.1 percent in energy consumption), while still retaining the design simplicity of the protocol.

Full Text