Cache prefetching and speculation on multi-threaded processors

Tarik Ono,Mark R Greenstreet

doi:10.1109/pacrim.2013.6625475

Abstract

Data prefetching is an important mechanism for hiding memory latency in single-threaded, desktop workloads. For multi-threaded, commercial workloads, prefetching offers much more modest improvements in performance at a high cost in cache power and bandwidth to the higher level caches. This paper shows that by combining speculation with a selective prefetching scheme, we can reduce the cache access power overhead while improving performance. We demonstrate that “likely-to-miss” load instructions can be accurately identified and we propose two hardware-based techniques for improving load latencies in multi-threaded commercial workloads. First, we modify a next-four-lines prefetching scheme to only perform the prefetch for likely-to-miss loads. Second, we forward addresses for likely-to-miss loads to the L2 and L3 caches for tag look-up immediately after address translation. Combined, these two techniques reduce the extra cache access power of the L3 cache by up to 53% while slightly improving performance when compared with a simple next-four-lines prefetcher running standard, commercial-workload benchmarks.

Full Text