Abstract

Most processors employ hardware data prefetching to hide memory access latencies. However the prefetching requests from different threads on a multi-core processor can cause severe interference with prefetching and/or demand requests of others. The data prefetching can lead to significant performance degradation due to shared resource contention on shared memory multi-core systems. This paper proposes a threadaware data prefetching mechanism based on low-overhead runtime information to tune prefetching modes and aggressiveness, mitigating the resource contention in the memory system. Our solution has two new components: 1) a filtering mechanism that informs the hardware about which prefetching requests can cause shared data invalidation and should be discarded, and 2) a self-tuning prefetcher that uses run-time feedback to adjust each thread’s data prefetching mode and arguments. On a set of parallel benchmarks, our thread-aware data prefetching mechanisms improve the overall performance of 64-core system by 11% and reduce the energy-delay product by 13% over a multi-mode prefetch baseline system with a two level cache organization and a conventional MESI-based directory coherence protocol. We compare our approach to the feedback directed prefetching (FDP) technique and find that it provides better performance on multi-core systems, while reducing the energy delay product.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call