Abstract

General-purpose graphics processing units (GP-GPUs) have emerged as a very powerful approach in the era of multi-cores. GP-GPUs typically employ hundreds of cores and use thousands of threads to exploit the data parallelism inherent in GP-GPU applications. L1 data caches in GPUs have gained traction over the recent years. With the emergence of GP-GPU applications, L1 data caches have become performance critical. Typically, L1 data cache is shared by multiple cores and hundreds of threads. In this paper, we characterize the performance of L1 data cache used in a GPU. We consider general-purpose applications like the Rodinia benchmark. We vary the cache size from 32 to 256 KB; furthermore, we vary the associativity from 32 to 256. We also vary the number of banks from 1 bank to 8 banks. We observe a very high miss rate for most of the applications. The high miss rate is due to the huge working set size of the GP-GPU applications. This high miss rate limits the performance gain of increasing the cache size, associativity, and number of banks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.