Abstract

In a Chip MultiProcessor(CMP) with shared caches, the last level cache is distributed across all the cores. This increases the on-chip communication delay and thus influence the processor's performance. Replication can be provided in shared caches to reduce the on-chip communication delay. However, current proposals do not take into account replicating blocks's access characteristics and how to make the best of replicas, which have limited performance benefit. In this paper, we observe that reusability of cache blocks influences the availability of replication scheme severely. Based on this observation, we propose Dynamic Reusability-based Replication (DRR), a novel cache design to exploit efficient replicas management using blocks's reuse pattern. DRR monitors the recent referenced cache blocks' access pattern, and replicates the blocks with high reusability to appropriate L2 slices, and the replicated copies can be shared by their nearby cores. We evaluate DRR for 16-core system using splash-2 and parsec benchmarks. DRR improves performance by 30% on average over conventional shared cache design, 16% over Victim Replication(VR), 8% over Adaptive Selected Replication (ASR), and 25% over R-NUCA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call