Abstract
The design of cache system for Chip Multiprocessor (CMP) face many challenges because future CMPs will have more cores and greater on-chip cache capacity. There are two base design schemes about L2 cache: private scheme in which each L2 slice is treated as a private L2 cache and shared scheme in which all L2 slices are treated as a large L2 cache shared by all cores. Private caches provide the lowest hit latency but reduce the total effective cache capacity. A shared L2 cache increases the effective cache capacity but has long hit latencies when data is on a remote tile. This paper present a new Controlled Replication (CR) policy to reduce the capacities occupied by redundant shared replicas. the new CR policy increases the effective capacity than victim replication scheme and has lower hit latency than shared scheme. We evaluate the various schemes using full-system simulation of parallel applications. Results show that CR reduces the average memory access latency of shared scheme by an average of 13%, providing better overall performance than victim replication and shared schemes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.