A Sample-Based Dynamic CPU and GPU LLC Bypassing Method for Heterogeneous CPU-GPU Architectures

Xin Wang,Wei Zhang

doi:10.1109/trustcom/bigdatase/icess.2017.309

Abstract

Heterogeneous multicore processors with integrated CPU and GPU (Graphic Processing Units) cores on the same chip post new challenges for resources sharing, which is crucial for performance. Unlike traditional multicores, the CPU and GPU cores in the integrated architecture can generate significantly different numbers of cache traffics and exhibit quite diverse temporal or spatial data locality. The shared last-level cache (LLC) can result in a large number of interferences between CPU and GPU LLC accesses, thus impacting the performance of both CPUs and GPUs. Cache bypassing is a promising method to improve LLC performance and to alleviate resource contention between CPU and GPU. However, inefficient cache bypassing may lead to significant NoC (Network-on-Chip) traffic congestion and hence performance degradation, particularly for the CPU on the heterogeneous CPU-GPU system with the on-chip ring network. In this paper, we propose a sample-based dynamic cache bypassing method for shared LLC in the heterogeneous CPUGPU multicore system. This method samples the LLC miss rates and NoC traffics for both CPU and GPU at run-time and uses a statistical bypassing decision-making model to intelligently decide whether to bypass or not. Our experiments show that instead of bypassing GPU, bypassing CPU can be even more important than bypassing GPU for the integrated CPU-GPU architecture with the ring-based NoC topology. The results indicate that bypassing both CPU and GPU can improve CPU performance by 34.30% and GPU performance by 3.20%, bypassing CPU alone enhances CPU performance by 38.09% and GPU performance by 1.11%, and bypassing GPU alone increases CPU performance by 4.12% and GPU performance by 2.60% on average.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Sample-Based Dynamic CPU and GPU LLC Bypassing Method for Heterogeneous CPU-GPU Architectures

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Reducing Inter-Application Interferences in Integrated CPU-GPU Heterogeneous Architecture
Hao Wen ... Wei Zhang
-
Hao Wen, et. al.Hao Wen ... Wei Zhang
01 Oct 2018
01 Oct 2018

Performance-Energy Considerations for Shared Cache Management in a Heterogeneous Multicore Processor
Anup Holey ... Vineeth Mekkat
ACM Transactions on Architecture and Code Optimization | VOL. 12
Anup Holey, et. al.Anup Holey ... Vineeth Mekkat
09 Mar 2015
ACM Transactions on Architecture and Code Optimization | VOL. 12

Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases
Yuriy Kochura ... Alexandr Rokovyi
-
Yuriy Kochura, et. al.Yuriy Kochura ... Alexandr Rokovyi
29 Mar 2019
29 Mar 2019

Cache locking vs. partitioning for real-time computing on integrated CPU-GPU processors
Xin Wang ... Wei Zhang
-
Xin Wang, et. al.Xin Wang ... Wei Zhang
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Sample-Based Dynamic CPU and GPU LLC Bypassing Method for Heterogeneous CPU-GPU Architectures

Abstract

Talk to us

Similar Papers