Access Adaptive and Thread-Aware Cache Partitioning in Multicore Systems

Kai Huang,Xiaoxu Zhang,Ke Wang,Dandan Zheng,Xiaolang Yan

doi:10.3390/electronics7090172

Abstract

Cache partitioning is a successful technique for saving energy for a shared cache and all the existing studies focus on multi-program workloads running in multicore systems. In this paper, we are motivated by the fact that a multi-thread application generally executes faster than its single-thread counterpart and its cache accessing behavior is quite different. Based on this observation, we study applications running in multi-thread mode and classify data of the multi-thread applications into shared and private categories, which helps reduce the interferences among shared and private data and contributes to constructing a more efficient cache partitioning scheme. We also propose a hardware structure to support these operations. Then, an access adaptive and thread-aware cache partitioning (ATCP) scheme is proposed, which assigns separate cache portions to shared and private data to avoid the evictions caused by the conflicts from the data of different categories in the shared cache. The proposed ATCP achieves a lower energy consumption, meanwhile improving the performance of applications compared with the least recently used (LRU) managed, core-based evenly partitioning (EVEN) and utility-based cache partitioning (UCP) schemes. The experimental results show that ATCP can achieve 29.6% and 19.9% average energy savings compared with LRU and UCP schemes in a quad-core system. Moreover, the average speedup of multi-thread ATCP with respect to single-thread LRU is at 1.89.

Highlights

For the desired performance, chip multiprocessor (CMP) architecture has been widely used for decades
The main competition field of CMPs has turned to embedded systems, in which energy optimization is essential, since many of them are powered by batteries
We evaluate the L2 cache energy consumption of our data adaptive thread-aware cache repartitioned for the remaining executing items

Summary

Introduction

Chip multiprocessor (CMP) architecture has been widely used for decades. Multicore architectures are typically equipped with a small private L1 cache for each core and a large shared L2 cache. These cache memories make up a large portion of the total area and consume a large fraction of the total energy of a chip. The energy consumption of the cache memories reaches as high as over 50% of the whole chip [3,4,5]. The energy consumption of the relatively large L2 cache can be cut down by turning off the unused ways [6,7,8,9] or using a drowsy cache [5,10] for implementation. Cache partitioning is an effective way of tackling this problem

Objectives

Methods

Conclusion