Reducing off-chip memory traffic by selective cache management scheme in GPGPUs

Hyojin Choi,Jaewoo Ahn,Wonyong Sung

doi:10.1145/2159430.2159443

Abstract

The performance of General Purpose Graphics Processing Units (GPGPUs) is frequently limited by the off-chip memory bandwidth. To mitigate this bandwidth wall problem, recent GPUs are equipped with on-chip L1 and L2 caches. However, there has been little work for better utilizing on-chip shared caches in GPGPUs. In this paper, we propose two cache management schemes: write-buffering and read-bypassing. The write buffering technique tries to utilize the shared cache for inter-block communication, and thereby reduces the DRAM accesses as much as the capacity of the cache. The read-bypassing scheme prevents the shared cache from being polluted by streamed data that are consumed only within a thread-block. The proposed schemes can be selectively applied to global memory instructions using newly defined cache operators. We evaluate the effects of the proposed schemes for a few GPGPU applications by simulations. We have shown that the off-chip memory accesses can be successfully reduced by the proposed techniques. We also analyze the effectiveness of these methods when the throughput gap between cores and off-chip memory becomes wider.

Full Text