Compressed L1 data cache and L2 cache in GPGPUs

Ehsan Atoofian

doi:10.1109/asap.2016.7760766

Abstract

General-Purpose Graphics Processing Units (GPGPUs) exploit several levels of caches to hide latency of memory and provide data for thousands of simultaneously executing threads. L1 data cache and L2 cache are critical to performance of GPGPUs as an L1 data cache should provide data for all threads within the corresponding Streaming Multiprocessor (SM) and the L2 cache should service memory requests of all threads across all SMs. In this paper, we exploit compression to increase effective capacity of the L1 data cache and the L2 cache and improve performance and energy of GPGPUs. Our work is motivated by the observation that many cache blocks accommodate values with low dynamic range, i.e. the differences between values within a cache block are small. Removing redundancy of cache values through compression reduces the effective cache block width, thereby enabling storing more blocks into a cache and improving performance. Also, we exploit opportunities provided by cache compression to reduce energy. Cache compression reduces effective size of cache blocks and this reduces dynamic power as a subset of a full-blown cache block is accessed for compressed data. Furthermore, cache compression increases the number of idle banks. Hence, static power can be reduced by power gating idle banks. Evaluation results reveal that on average, cache compression improves performance by 10.1% and reduces energy of caches by 8%.

Full Text