Abstract

Chip-multiprocessors have played a significant role in real parallel computer architecture design. For integrating tens of cores into a chip, designs tend towards with physically distributed last level caches. This naturally results in a Non-Uniform Cache Access design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Therefore, data movement and management impact access latency and power consumption. Remote misses limit the performance of multi-threaded applications, so using data locality is fundamental importance in Chip-multiprocessors. In this work we observed that a shared data writing behavior dramatically wastes precious on-chip cache resource and seriously affects the whole system performance. Therefore, we emphasis on improving the performance of applications that exhibit high data sharing, and propose a new prediction mechanism to predict accurately the impact of shared data and a scalable, efficient hybrid shared-aware cache coherence transition strategy which collaborate with directory-based MESI cache coherence protocol. In order to evaluate our proposal transition strategy, we experiment with the NAS Parallel Benchmarks and a modern Intel Harpertown multi-core machine. Results show the whole performance gains of up to 21% opposed to the traditional write-invalidate cache coherence transition strategy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call