Abstract

Modern GPGPUs have employed multi-threading to hide the long off-chip memory access latency caused by frequent cache misses. However, the limited cache capacity shared by thousands of concurrently running warps will introduce serious cache contention problem, which results in high cache miss rates to hurt the overall GPGPU performance. To address this problem, this paper proposes a novel cache technology (Re-Cache) by exploiting locality characteristics and reconfiguring the memory hierarchy including the unused shared memory space and the idle registers of pending warps without extra storage overhead for GPGPUs. To further improve the utilization efficiency of the Re-Cache space, this paper proposes a dynamic and reconfigurable cache organization for Re-Cache that explores the spatial and temporal locality differences and changes among the running applications. Re-Cache dynamically reconfigures the cache organization as the adaptive prefetch region and victim region according to the differences and changes of the spatial and temporal locality by catching the locality characteristics for each load instruction. For the high-locality load instructions, the prefetch region stores the subsequent data blocks to exploit the spatial locality of the running threads among all streaming multiprocessors and warps, while the victim region collects the evicted data blocks from the upper memory hierarchy to exploit the temporal locality. Experimental results demonstrate that the proposed Re-Cache reduces up to 34.82% and 31.62% cache misses, while improving about 47.17% and 14.96% IPC performance compared with the state-of-the-art designs for cache-sensitive kernels and benchmarks, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call