Abstract

The design of an effective last-level cache (LLC) in general—and an effective cache replacement/partitioning algorithm in particular—is critical to the overall system performance. The processor’s ability to hide the LLC miss penalty differs widely from one miss to another. The more instructions the processor manages to issue during the miss, the better it is capable of hiding the miss penalty and the lower the cost of that miss. This nonuniformity in the processor’s ability to hide LLC miss latencies, and the resultant nonuniformity in the performance impact of LLC misses, opens up an opportunity for a new cost-sensitive cache replacement algorithm. This paper makes two key contributions. First, It proposes a framework for estimating the costs of cache blocks at run-time based on the processor’s ability to (partially) hide their miss latencies. Second, It proposes a simple, low-hardware overhead, yet effective, cache replacement algorithm that is locality-aware and cost-sensitive (LACS). LACS is thoroughly evaluated using a detailed simulation environment. LACS speeds up 12 LLC-performance-constrained SPEC CPU2006 benchmarks by up to 51% and 11% on average. When evaluated using a dual/quad-core CMP with a shared LLC, LACS significantly outperforms LRU in terms of performance and fairness, achieving improvements up to 54%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.