Abstract

Memory footprint is a metric for quantifying data reuse in memory trace. It can also be used to approximate cache performance, especially in shared cache systems. Memory footprint is acquired through memory footprint analysis (FPA). However, its main limitation is that, for a memory trace of n accesses, the all-window FPA algorithm requires O(n3) time. Therefore, in this paper, we propose an analytical algorithm for FPA, whereby the average footprints are calculated in O(n2). The proposed algorithm can also be employed for window distribution analysis. Moreover, we propose a framework to enable the application of FPA to GPU kernels and model the performance of L1 cache memories. The results of experimental evaluations indicate that our proposed framework functions 1.55X slower than the Xiang’s formula, as a fast average FPA method, while it can also be utilized for window distribution analysis. In the context of FPA-based cache performance estimation, the experimental results indicate a fair correlation between the estimated L1 miss rates and those of the native GPU executions. On average, the proposed framework has 23.8% error in the estimation of L1 cache miss rates. Further, our algorithm runs 125X slower than the reuse distance analysis (RDA) when analyzing a single kernel. However, the proposed method outperforms RDA in modeling shared caches and multiple kernel executions in GPUs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call