Abstract

Heterogeneous multicore processors with integrated CPU and GPU (Graphic Processing Units) cores on the same chip post new challenges for resources sharing, which is crucial for performance. Unlike traditional multicores, the CPU and GPU cores in the integrated architecture can generate significantly different numbers of cache traffics and exhibit quite diverse temporal or spatial data locality. The shared last-level cache (LLC) can result in a large number of interferences between CPU and GPU LLC accesses, thus impacting the performance of both CPUs and GPUs. Cache bypassing is a promising method to improve LLC performance and to alleviate resource contention between CPU and GPU. However, inefficient cache bypassing may lead to significant NoC (Network-on-Chip) traffic congestion and hence performance degradation, particularly for the CPU on the heterogeneous CPU-GPU system with the on-chip ring network. In this paper, we propose a sample-based dynamic cache bypassing method for shared LLC in the heterogeneous CPUGPU multicore system. This method samples the LLC miss rates and NoC traffics for both CPU and GPU at run-time and uses a statistical bypassing decision-making model to intelligently decide whether to bypass or not. Our experiments show that instead of bypassing GPU, bypassing CPU can be even more important than bypassing GPU for the integrated CPU-GPU architecture with the ring-based NoC topology. The results indicate that bypassing both CPU and GPU can improve CPU performance by 34.30% and GPU performance by 3.20%, bypassing CPU alone enhances CPU performance by 38.09% and GPU performance by 1.11%, and bypassing GPU alone increases CPU performance by 4.12% and GPU performance by 2.60% on average.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call