Abstract

As an effective solution for hiding memory access latency, data prefetching, including hardware prefetching and software prefetching, is widely used to alleviate "memory wall" problem. Current software prefetching typically prefetches data to L1 cache. However, this strategy suffers from issues like inaccurate timeliness and over prefetch. To address this issue, we propose CSPM, a coordinated software prefetching mechanism for multi-level caches. To further improve memory performance, CSPM inserts prefetch instructions to multi cache levels according to access pattern and cache utilization, instead of only inserting prefetch instructions to L1 Cache. In this way, CSPM allows coordinately prefetch data to different cache levels. We implement CSPM based on the software prefetching framework in the GCC compiler, and uses STREAM and SPECfp 2006 benchmark suites to evaluate the effectiveness of CSPM. Results show that, compared to only prefetching data to L1 cache, CSPM delivers an average speedup of 1.37x, 2.49x, and 1.04x for STREAM under single core, STREAM under multiple cores, and SPECfp2006, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.