A Reuse-Degree Based Locality Classifier for Locality-Aware Data Replication

Qianqian Wu,Zhenzhou Ji

doi:10.1109/access.2019.2959840

Abstract

The last level cache (LLC) in shared configuration is widely used in the tiled chip multiprocessors (CMPs), which reduces the off-chip miss rate but incurs the long on-chip access latency. The state-of-the-art Locality-Aware Data Replication (LADR) scheme provides an effective tradeoff between capacity and latency through an in-hardware structure named locality classifier. However, the best Limited3 locality classifier (Limited3) in LADR equally preserves locality information of 3 cores for all cache lines indiscriminately that is superfluous for some lines reused by less than 3 cores but incomplete for other lines reused by more than 3 cores, which not only wastes the storage space but also limits the performance improvement. In this paper, we propose a novel concept of Reuse-Degree (RD) for each LLC line, since the line is loaded into LLC, to represent the number of cores that have reused the line. Then, we divide cache lines into Not Reused Line (NRL, RD = 0), Single Reused Line (SRL, RD = 1) and Multiple Reused Line (MRL, RD >= 2) based on their RDs and find that a significant fraction of LLC lines are NRLs or SRLs at any time. Based on this observation, we design a Reuse-Degree based Locality Classifier (RD_LC) for LADR. Specifically, RD_LC decouples the locality classifier from the LLC tag array and introduces two kinds of locality information arrays, single locality information array (SLIA) and complete locality information array (CLIA). Besides, RD_LC allocates a locality information entry only for the reused cache lines (SRLs or MRLs) instead of all cache lines, and assigns an SLIA entry to SRLs and a CLIA entry to MRLs. Our proposal avoids a waste of the storage space and also maintains enough locality information for the accuracy of data replication decisions. Experimental results show that our RD_LC for LADR saves 51% of the storage overhead than that of the baseline Limited3 locality classifier with a performance improvement and a network traffic reduction by 7.56% and 3.33 % respectively.

Highlights

It is commonly believed that tiled chip multiprocessors (CMPs), which contain a series of identical tiles connected over a switched direct network, are becoming the most scalable and promising architectures for future many-core CMPs [1]–[3]
COMBINATION In this paper, we analyze the hardware overhead and performance problems resulting from the coupled structure of Limited3 in the Locality-Aware Data Replication (LADR) [7] data replication scheme and take advantage of the decoupled structures [13], [14] to design a decoupled locality classifier for LADR, which introduces two kinds of locality information arrays and allocates appropriate storage space according to the reuse-degree (RD) of the cache lines
On the other hand, when a cache line is accessed as a home level cache (LLC) line, the locality information in Limited3 is coupled with the directory sharer list and the locality information in Reuse-Degree based Locality Classifier (RD_LC) is stored in single locality information array (SLIA) or MLIA which is decoupled from the LLC directory

Summary

Introduction

It is commonly believed that tiled chip multiprocessors (CMPs), which contain a series of identical tiles connected over a switched direct network, are becoming the most scalable and promising architectures for future many-core CMPs [1]–[3]. LADR introduces an in-hardware run-time Complete Locality Classifier (Complete) to track the locality information of all cores for each cache line in LLC used for guiding the replication decisions.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Reuse-Degree Based Locality Classifier for Locality-Aware Data Replication

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

An Improved Scheme of Victim Replication in Tiled Chip Multiprocessors
Qianqian Wu ... Zhenzhou Ji
-
Qianqian Wu, et. al.Qianqian Wu ... Zhenzhou Ji
01 Aug 2019
01 Aug 2019

Efficient Cache Resizing policy for DRAM-based LLCs in ChipMultiprocessors
Bindu Agarwalla ... Nilkanta Sahu
Journal of Systems Architecture | VOL. 113
Bindu Agarwalla, et. al.Bindu Agarwalla ... Nilkanta Sahu
17 Sep 2020
Journal of Systems Architecture | VOL. 113

Towards Efficient Dynamic LLC Home Bank Mapping with NoC-Level Support
Mario Lodde ... Manuel E Acacio
-
Mario Lodde, et. al.Mario Lodde ... Manuel E Acacio
01 Jan 2013
01 Jan 2013

Energy Efficient Last Level Caches via Last Read/Write Prediction
Marco A.Z Alves ... Matthias Diener
-
Marco A.Z Alves, et. al.Marco A.Z Alves ... Matthias Diener
01 Oct 2013
01 Oct 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Reuse-Degree Based Locality Classifier for Locality-Aware Data Replication

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access