Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

Bingchao Li,Jizeng Wei,Nam Sung Kim

doi:10.1016/j.micpro.2021.104301

Bingchao Li, Jizeng Wei + Show 1 more

Open Access

https://doi.org/10.1016/j.micpro.2021.104301

Copy DOI

Abstract

GPUs provide megabytes of registers and shared memories to maintain the contexts for thousands of threads and enable fast data sharing amongst threads of a thread block, respectively. Besides, GPUs employ L1 cache to provide the high bandwidth service for memory requests. However, the average L1 cache capacity per thread is very limited, resulting in cache thrashing which in turn impairs the performance. Meanwhile, many registers and shared memories are unassigned to any warps or thread blocks. Moreover, registers and shared memories that are assigned can be idle when warps or thread blocks are finished. Exploiting the above insights, we propose Virtual-Cache to cost-effectively increase the effective size of L1 cache by utilizing the unassigned and released registers and shared memories as cache-lines in this paper. Specifically, we leverage the unassigned registers and shared memories to serve cache requests directly. Regarding the registers assigned to a warp, they can work as cache-lines after the warp completes the execution and before they are accessed again by a new launched warp. Regarding the shared memories of a thread block, they are enabled to serve cache requests when the thread block is finished till they are referenced by shared memory instructions of the relaunched thread block. The register file, shared memory and L1 cache are physically independent but logically unified as a large virtual cache with redesigned cache-line management. We develop the control and data path for the register file, making the register file accessible for cache requests by borrowing an operand collector to serve the cache requests. We also expand the control and data path for the shared memory to serve the cache requests. Our evaluation results show that Virtual-Cache makes the performance improved by 28% over the previously proposed cache management technique for cache-sensitive applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microprocessors and Microsystems	Publication Date: Jun 26, 2021
Citations: 2	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems

Lead the way for us

Similar Papers

Analysis of Thread Block Scheduling Algorithms for General Purpose GPU Systems
Soyeon Park ... Kyungwoon Cho
-
Soyeon Park, et. al.Soyeon Park ... Kyungwoon Cho
08 Dec 2021
08 Dec 2021

An analytical GPU performance model for 3D stencil computations from the angle of data traffic
Huayou Su ... Chunyuan Zhang
The Journal of Supercomputing | VOL. 71
Huayou Su, et. al.Huayou Su ... Chunyuan Zhang
26 Feb 2015
The Journal of Supercomputing | VOL. 71

Improving GPGPU resource utilization through alternative thread block scheduling
Minseok Lee ... Joosik Moon
-
Minseok Lee, et. al.Minseok Lee ... Joosik Moon
01 Feb 2014
01 Feb 2014

On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator
...
-
, et. al. ...
18 Dec 2014
18 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems