Performance evaluation and cost analysis of cache protocol extensions for shared-memory multiprocessors

F Dahlgren,M Dubois,P Stenstrom

doi:10.1109/12.729785

Abstract

We evaluate three extensions to directory-based cache coherence protocols in shared-memory multiprocessors. These extensions are aimed at reducing the penalties associated with memory accesses and include a hardware prefetching scheme, a migratory sharing optimization, and a competitive-update mechanism. Since each extension targets distinct components of the read and write penalties, they can be combined effectively. This paper identifies the combinations yielding the best performance gains and cost trade-offs in the context of a class of cache-coherent NUMA (Non-Uniform Memory Access) architectures. Detailed architectural simulations of a multiprocessor with single-issue, statically scheduled CPUs, using five benchmarks, show that the protocol extensions often provide additive gains when they are properly combined. For example, the combination of prefetching with the competitive-update mechanism speeds up the execution by nearly a factor of two under release consistency. The same speedup is obtained under sequential consistency by combining prefetching with the migratory sharing optimization. This paper shows that a basic write-invalidate protocol augmented by appropriate extensions can eliminate most memory access penalties without any support from the programmer or the compiler.

Full Text