Abstract

The performance of General Purpose Graphics Processing Units (GPGPUs) is frequently limited by the off-chip memory bandwidth. To mitigate this bandwidth wall problem, recent GPUs are equipped with on-chip L1 and L2 caches. However, there has been little work for better utilizing on-chip shared caches in GPGPUs. In this paper, we propose two cache management schemes: write-buffering and read-bypassing. The write buffering technique tries to utilize the shared cache for inter-block communication, and thereby reduces the DRAM accesses as much as the capacity of the cache. The read-bypassing scheme prevents the shared cache from being polluted by streamed data that are consumed only within a thread-block. The proposed schemes can be selectively applied to global memory instructions using newly defined cache operators. We evaluate the effects of the proposed schemes for a few GPGPU applications by simulations. We have shown that the off-chip memory accesses can be successfully reduced by the proposed techniques. We also analyze the effectiveness of these methods when the throughput gap between cores and off-chip memory becomes wider.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.