Abstract

Chip-multiprocessors (CMPs) have become the mainstream parallel architecture in enterprise and scientific computing facilities. For scalability reasons, design with larger core counts tends towards with physically distributed hardware caches. This naturally results in a Non-Uniform Cache Access design, where data movement and management impacts access latency and consume power. In this work, we observed that shared data writing behavior dramatically wastes precious on-chip hardware cache resource and seriously affects the whole system performance due to the high remote access latency. Therefore, we propose a new prediction mechanism to predict the impact of shared data and a directory-based MESI cache coherence protocol with selective write-shared-data-update transition strategy instead of native write-invalidate strategy. We evaluate our proposal on a modern multi-core machine with NAS Parallel Benchmarks. Experimental results showed speedup gains of up to 21% opposed to the native write-invalidate transition strategy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call