Laziest write posting on bus-based SMPs

Y Lee

doi:10.1109/ccece.2000.849663

Abstract

Private caches are critical components for hiding memory access latency in high performance multiprocessor systems. However, it has been found that, when executing a parallel program, multiple processors may concurrently update a distinct portion of a cache line and cause unnecessary cache invalidation under traditional cache coherence protocols. Such invalidation can be delayed when software enforces a proper order of memory reads and writes using synchronization primitives. Although delaying cache invalidation until the next synchronization instruction avoids unnecessary coherence traffic, it still incurs additional overhead to invalidate and reconcile the inconsistent cache copies. In this paper, a deferred coherence model is presented, which extends the traditional coherence protocol with new partially-modified states to allow multiple writers to simultaneously update different portions of the same cache line. In addition, the proposed model separates the events of write notification and data reconciliation so that the updated data is posted only when another processor asks for the data. Furthermore, an efficient merging mechanism is incorporated to reconcile multiple inconsistent copies of a modified line upon accessing a potentially stale data. Execution-driven simulation of SPLASH-2 applications shows that the deferred coherence model can out-perform the traditional eager coherence model by up to 20%.

Full Text