Abstract

Chip multiprocessors (CMP) with more cores have more traffic to the last-level cache (LLC) . Without a corresponding increase in LLC bandwidth, such traffic cannot be sustained, resulting in performance degradation. Previous research focused on data placement techniques to improve access latency in Non-Uniform Cache Architectures (NUCA) . Placing data closer to the referring core reduces traffic in cache interconnect. However, earlier data placement work did not account for the frequency with which specific memory references are accessed. The difficulty of tracking access frequency for all memory references is one of the main reasons why it was not considered in NUCA data placement. In this research, we present a hardware-assisted solution called ACTION ( A daptive C ache Block Migra tion ) to track the access frequency of individual memory references and prioritize placement of frequently referred data closer to the affine core. ACTION mechanism implements cache block migration when there is a detectable change in access frequencies due to a shift in the program phase. ACTION counts access references in the LLC stream using a simple and approximate method and uses a straightforward placement and migration solution to keep the hardware overhead low. We evaluate ACTION on a 4-core CMP with a 5x5 mesh LLC network implementing a partitioned D-NUCA against workloads exhibiting distinct asymmetry in cache block access frequency. Our simulation results indicate that ACTION can improve CMP performance by up to 7.5% over state-of-the-art (SOTA) D-NUCA solutions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call