GPUs Cache Performance Estimation using Reuse Distance Analysis

Yehia Arafa,Atanu Barai,Nandakishore Santhi,Abdel-Hameed A Badawy,Gopinath Chennupati,Stephan Eidenbenz

doi:10.1109/ipccc47392.2019.8958760

Abstract

GPU architects have introduced on-chip memories in GPUs to provide local storage nearby processing to reduce the traffic to the device global memory. From then on-wards, modeling to predict the cache performance has been an active area of research. However, due to the complexities found in this highly parallel hardware, this has not been a straightforward task. In this paper, we propose a memory model to predict the entire cache performance (L1 & L2 caches) in GPUs. Our model is based on reuse distance. We use an analytical probabilistic measure of the reuse distance distributions from the memory traces of an application to predict the hit rates. The application’s memory trace is extracted using NVIDIA’s SASSI instrumentation tool. We use 20 different kernels from Polybench and Rodinia benchmark suites and compare our model to the real hardware. The results show that the average prediction accuracy of the model over all the kernels is 86.7% compared to the real device with higher accuracy for the L2 (95.26%) cache than the L1. Furthermore, extracting the application’s memory trace is on average 4. 9x slower compared to the kernels running without instrumentation. This overhead is much smaller than other published results. Furthermore, our model is very flexible where it takes into account the different cache parameters thus it can be used for design space exploration and sensitivity analysis.

Full Text