Mitigating Critical Path Decompression Latency in Compressed L1 Data Caches Via Prefetching

Sean Rea,Ehsan Atoofian

doi:10.1109/ipdpsw.2018.00112

Abstract

Increasing the size of cache memory is a common approach for reducing miss rates and increasing performance in a CPU. Doing this, however, increases the static and dynamic energy consumption of the cache. Compression can be utilized to increase the effective capacity of cache memory without physically increasing its size. We can also use compression to reduce the physical size of the cache, and therefore reduce its energy consumption, while maintaining a reasonable effective cache capacity. Unfortunately, a decompression latency is experienced when accessing the compressed data. This affects the critical execution path of the processor and can have a significant impact on performance, especially when implemented in L1 cache. Previous work has used cache prefetching techniques to hide the latency of lower level memory accesses. Our work proposes the combination of data prefetching and compression techniques to reduce the impact of decompression latency and improve the feasibility of compression in L1 caches. We evaluate the performance of Last Outcome (LO), Stride (S), and Two-Level (2L) prefetching, as well as hybrid combinations of these methods (S/LO & 2L/S), in combination with Base-Delta-Immediate (B ? I) compression. The results demonstrate that using B ? I, in combination with data prefetching, provides performance improvement over B?I compression alone in L1 data cache. We find that a 4KB Hybrid S/LO prefetcher results in an average speedup of 1.7% and improvement to the energy-delay product of the CPU by 1.5% versus B ? I alone.

Full Text