Buffer on Last Level Cache for CPU and GPGPU Data Sharing

Licheng Yu,Tianzhou Chen,Li Liu,Minghui Wu

doi:10.1109/hpcc.2014.71

Abstract

With the rapid growth in demand of massive data processing and the limitation of process development in microprocessor, GPGPU gains more and more attentions to provide huge power of data parallelism. Tightly-coupled CPU and GPGPU that share the LLC (last level cache) enables fine-grained workload offload between CPU and GPGPU. In the paper, we focus on one data transfer pattern where the data are usually in form of independent element, each of which is to be processed by the other processor when it is ready. Traditionally, CPU prepares all the data that are to be processed by GPGPU before starting GPGPU. This creates long waiting time, and the shared LLC may suffer from cache trashing if the work set can not fit in the LLC. To alleviate these problems, we propose the LLC buffer as a data transfer mechanism between CPU and GPGPU on shared LLC. The LLC buffer exploits part of LLC storage to work as one or more stream buffers, and stashes each data element as an independent transfer unit. With the help of LLC buffer, we achieve an average speedup of 1.48x and eliminate 1346x memory writes (cache evictions) from LLC.

Full Text