Performance-Energy Considerations for Shared Cache Management in a Heterogeneous Multicore Processor

Anup Holey,Antonia Zhai,Pen-Chung Yew,Vineeth Mekkat

doi:10.1145/2710019

Anup Holey, Antonia Zhai + Show 2 more

Open Access

PDF Available

https://doi.org/10.1145/2710019

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as graphic processing unit (GPU) cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is one of the most important shared resources due to its impact on performance. Accesses to the shared LLC in heterogeneous multicore processors can be dominated by the GPU due to the significantly higher number of concurrent threads supported by the architecture. Under current cache management policies, the CPU applications’ share of the LLC can be significantly reduced in the presence of competing GPU applications. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism (TLP). In addition to the performance challenge, introduction of diverse cores onto the same die changes the energy consumption profile and, in turn, affects the energy efficiency of the processor. In this work, we propose heterogeneous LLC management (HeLM), a novel shared LLC management policy that takes advantage of the GPU’s tolerance for memory access latency. HeLM is able to throttle GPU LLC accesses and yield LLC space to cache-sensitive CPU applications. This throttling is achieved by allowing GPU accesses to bypass the LLC when an increase in memory access latency can be tolerated. The latency tolerance of a GPU application is determined by the availability of TLP, which is measured at runtime as the average number of threads that are available for issuing. For a baseline configuration with two CPU cores and four GPU cores, modeled after existing heterogeneous processor designs, HeLM outperforms least recently used (LRU) policy by 10.4%. Additionally, HeLM also outperforms competing policies. Our evaluations show that HeLM is able to sustain performance with varying core mix. In addition to the performance benefit, bypassing also reduces total accesses to the LLC, leading to a reduction in the energy consumption of the LLC module. However, LLC bypassing has the potential to increase off-chip bandwidth utilization and DRAM energy consumption. Our experiments show that HeLM exhibits better energy efficiency by reducing the ED 2 by 18% over LRU while impacting only a 7% increase in off-chip bandwidth utilization.

Full Text