Abstract

Memory fences are widely used to ensure the correctness for synchronization constructs on machines with relaxed consistency models. However, they are expensive and usually impose over-constrained ordering that causes unnecessary CPU stalls. In this paper, we observe that memory fences in TSO are merely intended to order synchronization variables. Based on this observation, we rethink the hardware-software interface of synchronization constructs on multicore processors and propose a new design called Sync-Order that differentiates synchronization variables ( sync-vars ) from normal ones. Sync-Order reduces hardware complexity such that the processor only needs to serialize the ordering among sync-vars . Its simplicity makes it easy to be integrated to the directory controller and it supports distributed directory, a missing feature in prior designs. We show that Sync-Order eliminates traditional fences on all sides of synchronization constructs (instead of only one side in prior work) and requires small effort for a programmer or compiler to annotate sync-vars . Our experimental results show that Sync-Order significantly reduces CPU stalls and boosts the performance of a set of synchronization constructs and concurrent data structures by 10 percent; meanwhile, the fence overhead of full applications from SPLASH-2 and PARSEC is reduced from 42 to 3 percent.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call