Abstract

Data prefetching is an important mechanism for hiding memory latency in single-threaded, desktop workloads. For multi-threaded, commercial workloads, prefetching offers much more modest improvements in performance at a high cost in cache power and bandwidth to the higher level caches. This paper shows that by combining speculation with a selective prefetching scheme, we can reduce the cache access power overhead while improving performance. We demonstrate that “likely-to-miss” load instructions can be accurately identified and we propose two hardware-based techniques for improving load latencies in multi-threaded commercial workloads. First, we modify a next-four-lines prefetching scheme to only perform the prefetch for likely-to-miss loads. Second, we forward addresses for likely-to-miss loads to the L2 and L3 caches for tag look-up immediately after address translation. Combined, these two techniques reduce the extra cache access power of the L3 cache by up to 53% while slightly improving performance when compared with a simple next-four-lines prefetcher running standard, commercial-workload benchmarks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.