In a sectored cache, a cache line is divided into several subblocks. Each subblock is a basic coherence unit. In this way partial block invalidation can be done on the cache lines in order to eliminate false sharing on invalidate-based multiprocessors. Sectored caches often include a facility, called bounteous transfers, to supply extra subblocks after transferring the missed subblock on a read miss. Unfortunately, previous works on sectored caches concentrated mainly on solving the false sharing problem, while overlooked the prefetching effects of bounteous transfer. In this paper, we evaluate the performance impacts of bounteous based on a MESI-based sectored cache. Three different types of bounteous transfer are evaluateds bounteous transfer wuth valid subblocks (BT-V), bounteous transfer with clean subblocks (BT-C), and bounteous disabled (No-BT). We simulated the execution of typical benchmarksFFT, LU, Radix, SOR, on the MESI-based sectored cache. Two metrics U-rate and R-rate are proposed to help observe the sharing granularities and coherence overhead. Evaluation results show that different benchmarks work better with different kinds of bounteous transfer and using bounteous transfer carelessly may result in performance degradation.
Read full abstract