Last-level Cache Research Articles

The last level cache (LLC) in shared configuration is widely used in the tiled chip multiprocessors (CMPs), which reduces the off-chip miss rate but incurs the long on-chip access latency. The state-of-the-art Locality-Aware Data Replication (LADR) scheme provides an effective tradeoff between capacity and latency through an in-hardware structure named locality classifier. However, the best Limited3 locality classifier (Limited3) in LADR equally preserves locality information of 3 cores for all cache lines indiscriminately that is superfluous for some lines reused by less than 3 cores but incomplete for other lines reused by more than 3 cores, which not only wastes the storage space but also limits the performance improvement. In this paper, we propose a novel concept of Reuse-Degree (RD) for each LLC line, since the line is loaded into LLC, to represent the number of cores that have reused the line. Then, we divide cache lines into Not Reused Line (NRL, RD = 0), Single Reused Line (SRL, RD = 1) and Multiple Reused Line (MRL, RD >= 2) based on their RDs and find that a significant fraction of LLC lines are NRLs or SRLs at any time. Based on this observation, we design a Reuse-Degree based Locality Classifier (RD_LC) for LADR. Specifically, RD_LC decouples the locality classifier from the LLC tag array and introduces two kinds of locality information arrays, single locality information array (SLIA) and complete locality information array (CLIA). Besides, RD_LC allocates a locality information entry only for the reused cache lines (SRLs or MRLs) instead of all cache lines, and assigns an SLIA entry to SRLs and a CLIA entry to MRLs. Our proposal avoids a waste of the storage space and also maintains enough locality information for the accuracy of data replication decisions. Experimental results show that our RD_LC for LADR saves 51% of the storage overhead than that of the baseline Limited3 locality classifier with a performance improvement and a network traffic reduction by 7.56% and 3.33 % respectively.

Read full abstract

With the increasing complexity of recent autonomous platforms, there is a strong demand to better utilize system resources while satisfying stringent real-time requirements. Embedded virtualization is an appealing technology to meet this demand. It enables the consolidation of real-time systems with different criticality levels on a single hardware platform by enforcing temporal isolation. On multi-core platforms, however, shared hardware resources, such as caches and memory buses, weaken this isolation. In particular, due to the resulting cache interference, a large last-level cache in recent processors can easily jeopardize the timing predictability of real-time tasks due to cache interference. While researchers in the real-time systems community have developed solutions to tackle this problem, existing cache management schemes reveal two major limitations when used in a clustered multi-core embedded system. The first is the cache co-partitioning problem, which can lead to wrong cache allocation and cache underutilization. The second is the cache interference of inter-virtual-machine (VM) communication because prior work has considered only independent tasks. This paper presents a cluster-aware real-time cache allocation scheme to address these problems. The proposed scheme takes into account the cluster information of the system, and finds the cache allocation that satisfies the timing and memory requirements of tasks. The scheme also maximizes slack time to meet task deadline, which brings flexibility and resilience to unexpected events. Tasks using inter-VM communication are also provided with guaranteed blocking time and cache isolation. We have implemented a prototype of our scheme on an Nvidia TX2 clustered multi-core platform and evaluated the effectiveness of our scheme over cluster-unaware approaches.

Read full abstract

Last-level Cache Research Articles

Related Topics

Articles published on Last-level Cache

DUCATI

A Novel High Performance and Energy Efficient NUCA Architecture for STT-MRAM LLCs With Thermal Consideration

Miss-aware LLC buffer management strategy based on heterogeneous multi-core

Spy Cartel: Parallelizing Evict+Time-Based Cache Attacks on Last-Level Caches

Writeback-Aware LLC Management for PCM-Based Main Memory Systems

Reducing Writebacks Through In-Cache Displacement

A Gaussian Set Sampling Model for Efficient Shared Cache Profiling on Multi-Cores

A Reuse-Degree Based Locality Classifier for Locality-Aware Data Replication

PR-LRU: partial random LRU technique for performance improvement of last level cache

PR-LRU: partial random LRU technique for performance improvement of last level cache

MemWander: Memory Dynamic Remapping via Hypervisor Against Cache-Based Side-Channel Attacks

Cache-Aware Real-Time Virtualization for Clustered Multi-Core Platforms

Hybrid Remote Access Protocol

Staccato: shared-memory work-stealing task scheduler with cache-aware memory management

Decoupled Fused Cache

Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems

Contention-Aware Fair Scheduling for Asymmetric Single-ISA Multicore Systems

Fast modeling DRAM access latency based on the LLC memory stride distribution without detailed simulations

ReD: A reuse detector for content selection in exclusive shared last-level caches

A fault-tolerant last level cache for CMPs operating at ultra-low voltage

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Last-level Cache Research Articles

Related Topics

Articles published on Last-level Cache

DUCATI

A Novel High Performance and Energy Efficient NUCA Architecture for STT-MRAM LLCs With Thermal Consideration

Miss-aware LLC buffer management strategy based on heterogeneous multi-core

Spy Cartel: Parallelizing Evict+Time-Based Cache Attacks on Last-Level Caches

Writeback-Aware LLC Management for PCM-Based Main Memory Systems

Reducing Writebacks Through In-Cache Displacement

A Gaussian Set Sampling Model for Efficient Shared Cache Profiling on Multi-Cores

A Reuse-Degree Based Locality Classifier for Locality-Aware Data Replication

PR-LRU: partial random LRU technique for performance improvement of last level cache

PR-LRU: partial random LRU technique for performance improvement of last level cache

MemWander: Memory Dynamic Remapping via Hypervisor Against Cache-Based Side-Channel Attacks

Cache-Aware Real-Time Virtualization for Clustered Multi-Core Platforms

Hybrid Remote Access Protocol

Staccato: shared-memory work-stealing task scheduler with cache-aware memory management

Decoupled Fused Cache

Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems

Contention-Aware Fair Scheduling for Asymmetric Single-ISA Multicore Systems

Fast modeling DRAM access latency based on the LLC memory stride distribution without detailed simulations

ReD: A reuse detector for content selection in exclusive shared last-level caches

A fault-tolerant last level cache for CMPs operating at ultra-low voltage