Abstract

With growth of on-chip communication delays and working sets of commercial and scientific workloads, L2 caches of Chip Multiprocessors (CMPs) are subject to heave pressure. Basically, there are two kinds of designs for L2 cache. First, using shared L2 cache to maximize the aggregate cache capacity and minimize off-chip memory requests. Second, using private L2 cache to minimize delays on global wires and cache access time. Recent hybrid designs offer replication to balance latency and capacity, however it requires complicated lookup and coherence mechanisms that increase latency or fail to optimize core counts. Our experiments with tiled architecture show that communication traffic of each tile is imbalance and, utilization of each L2 cache is significant different. Based on this observation, we propose a novel adaptive replication policy (ARP) based on tiled shared caches, a mechanism that regularly checks workload behavior to control replication. ARP replicates cache blocks only when the benefit of replication is larger than the cost. Simulations of 16-core CMPs shows that ARP provides better performance: communication traffic is reduced by 3%–48%, average access distance is reduced by 3%–52%, and utilization ratio of aggregate L2 caches capacity is increased by 60%–350%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call