Abstract

As we are approaching the exascale era in supercomputing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call