Abstract

GPU architects have introduced on-chip memories in GPUs to provide local storage nearby processing to reduce the traffic to the device global memory. From then on-wards, modeling to predict the cache performance has been an active area of research. However, due to the complexities found in this highly parallel hardware, this has not been a straightforward task. In this paper, we propose a memory model to predict the entire cache performance (L1 & L2 caches) in GPUs. Our model is based on reuse distance. We use an analytical probabilistic measure of the reuse distance distributions from the memory traces of an application to predict the hit rates. The application’s memory trace is extracted using NVIDIA’s SASSI instrumentation tool. We use 20 different kernels from Polybench and Rodinia benchmark suites and compare our model to the real hardware. The results show that the average prediction accuracy of the model over all the kernels is 86.7% compared to the real device with higher accuracy for the L2 (95.26%) cache than the L1. Furthermore, extracting the application’s memory trace is on average 4. 9x slower compared to the kernels running without instrumentation. This overhead is much smaller than other published results. Furthermore, our model is very flexible where it takes into account the different cache parameters thus it can be used for design space exploration and sensitivity analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.