Analytical Modeling the Multi-Core Shared Cache Behavior With Considerations of Data-Sharing and Coherence

Guangmin Wang,Ming Ling,Xiaoqian Lu,Jiancong Ge

doi:10.1109/access.2021.3053350

Abstract

To mitigate the ever worsening “Power wall” and “Memory wall” problems, multi-core architectures with multi-level cache hierarchies have been widely accepted in modern processors. However, the complexity of the architectures makes modeling of shared caches extremely complex. In this article, we propose a data-sharing aware analytical model for estimating the miss rates of the downstream shared cache under multi-core scenarios. To avoid time-consuming full simulations of the cache architecture required by conventional approaches, the proposed model can also be integrated with our refined upstream cache analytical model, which also evaluates coherence misses with similar accuracies of state-of-the-art approach with only one tenth time overhead. We validate our analytical model against gem5 simulation results under 13 applications from PARSEC 2.1 benchmark suites. Compared to the results from gem5 simulations under 8 hardware configurations including dual-core and quad-core architectures, the average absolute error of the predicted shared L2 cache miss rates is less than 2% for all configurations. After integrated with the refined upstream model with coherence misses, the overall average absolute error in 4 hardware configurations is degraded to 4.82% due to the error accumulations. As an application case of the integrated model, we also evaluate the miss rates of 57 different multi-core and multi-level cache configurations.

Highlights

Performance evaluation plays an important role in the design cycle of the generation processors as it allows architects to choose the architectural parameters for optimal performance and energy consumption trade off
We divide the changes of Reuse Distance Histogram (RDH) caused by the interleaving into two categories: 1) As shown in Fig. 4(b), the reuse distance of the reference A increases because of the references from the other core, which is named as the insertion effect; 2) As shown in Fig. 4(c), when the address of an inserted reference is same as the endpoint of the reuse epoch, i.e., A in this case, the original reuse epoch is split into two new reuse epochs, which is called the split effect
Considering that our model focuses on data sharing, we merely perform the validation in the parallel phase, which is called region of interest(ROI)

Summary

INTRODUCTION

Performance evaluation plays an important role in the design cycle of the generation processors as it allows architects to choose the architectural parameters for optimal performance and energy consumption trade off. To obtain the RDH/SDH of the merged shared cache reference stream, these previous works have to extract the individual SDHs/RDHs from each core’s private cache to the shared L2 cache from detailed simulations, which, to a large extent, nullifies the evaluation speed benefits of analytical modeling Another factor that needs to consider is the data sharing among the threads running on different cores, i.e., different cores accessing L2 cache with same addresses. To eliminate the time-consuming full simulations, we integrate the proposed model with the upstream model [6], [7] that put forward by our previous work, which outputs the individual L2 accessing RDHs from each core’s private cache, to construct a multi-core multi-level cache model framework.

RELATED WORKS

QUANTIFYING THE SPLIT EFFECT

INTEGRATING WITH THE UPSTREAM CACHE MODEL

EVALUATION

APPLICATION OF THE INTEGRATED MODEL

CONCLUSION