Abstract

Modern graphics processing units (GPUs) with massive parallel architecture can boost the performance of both graphics and general-purpose applications. With the support of new programming tools, GPUs have become one of the most attractive platforms in the exploitation of the high thread-level parallelism. In the recent GPUs, hierarchical cache memories have been employed to support irregular memory-access patterns. However, the L1 data cache exhibits a poor efficiency in GPUs, and this is mainly due to the cache contention and the resource congestion. This paper shows that the L1 data cache does not always positively impact applications in terms of the performance; in fact, many applications are even slowed down due to the use of the L1 data cache. In this paper, a novel cache bypassing mechanism (CARB) is proposed to increase the efficiency of the GPU cache management and to improve the GPU performance. The CARB mechanism exploits the characteristics of the currently executed applications to estimate the performance impact of the L1 data cache on the GPU, and it then allows memory requests to bypass the cache in discrete phases during the execution time. The bypassing decision is determined adaptively at runtime. Experiment results show that the CARB mechanism achieves an average speedup of 22% for a wide range of GPGPU applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call