Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications

Guang Suo

doi:10.1109/pdcat.2009.48

Abstract

Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning (SLCP), Time-level Cache Partitioning (TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.

Full Text