Abstract

With the rapid growth in demand of massive data processing and the limitation of process development in microprocessor, GPGPU gains more and more attentions to provide huge power of data parallelism. Tightly-coupled CPU and GPGPU that share the LLC (last level cache) enables fine-grained workload offload between CPU and GPGPU. In the paper, we focus on one data transfer pattern where the data are usually in form of independent element, each of which is to be processed by the other processor when it is ready. Traditionally, CPU prepares all the data that are to be processed by GPGPU before starting GPGPU. This creates long waiting time, and the shared LLC may suffer from cache trashing if the work set can not fit in the LLC. To alleviate these problems, we propose the LLC buffer as a data transfer mechanism between CPU and GPGPU on shared LLC. The LLC buffer exploits part of LLC storage to work as one or more stream buffers, and stashes each data element as an independent transfer unit. With the help of LLC buffer, we achieve an average speedup of 1.48x and eliminate 1346x memory writes (cache evictions) from LLC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.