Scrabble: A Fine-Grained Cache with Adaptive Merged Block

Chao Zhang,Yuan Zeng,Xiaochen Guo

doi:10.1109/tc.2019.2939809

Abstract

A large fraction of the microprocessor energy is consumed by the data movement in the system. One of the reasons is the inefficiency in the conventional cache design. Cache blocks larger than a word are used in conventional caches to exploit spatial locality. However, many applications only use a small part of a cache block before its eviction. Transferring and storing unused data wastes bandwidth, energy, and limited cache space. Prior work on fine-grained caches can reduce data access and storage granularity to reduce the amount of unused data. However, small data blocks typically require greater metadata and control overhead. Sharing the common bits among tags of fine-grained blocks can reduce the metadata overhead but the constraints on which fine-grained blocks can share tag bits can cause fragmentation. This work proposes scrabble, a fine-grained cache that can merge multiple non-contiguous fine-grained blocks into a variable size merged block. The length of the shared tag is maximized to reduce the metadata overhead. The space utilization is improved by supporting merged blocks with variable size. The control overhead can be reduced by moving the merged block together from memory to the last level cache. For applications with poor spatial locality, Scrabble cache can achieve more than 40 percent of performance improvement. Even for application with good spatial locality, the speedup is still more than 7 percent. In general, for an evaluated set of benchmarks, Scrabble cache achieves an average of 2.41× effective capacity over the baseline cache with the same cache capacity which leads to a 16.7 percent performance improvement and an 11 percent on-chip energy reduction. As compared to a state-of-the-art fine-grained cache, Scrabble cache achieves a 1.25× effective capacity, a 7.9 percent speedup, and a 5.8 percent on-chip energy reduction.

Full Text