Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Francisco Candel,Julio Sahuquillo,Alejandro Valero,Salvador Petit

doi:10.1109/tc.2019.2907591

Francisco Candel, Julio Sahuquillo + Show 2 more

Open Access

https://doi.org/10.1109/tc.2019.2907591

Copy DOI

Abstract

To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases each GPU generation. This paper shows that counter-intuitively, enlarging the LLC brings marginal performance gains in most applications. In other words, increasing the LLC size does not scale neither in performance nor energy consumption. We examine how LLC misses are managed in typical GPUs, and we find that in most cases the way LLC misses are managed are precisely the main performance limiter. This paper proposes a novel approach that addresses this shortcoming by leveraging a tiny additional Fetch and Replacement Cache-like structure (FRC) that stores control and coherence information of the incoming blocks until they are fetched from main memory. Then, the fetched blocks are swapped with the victim blocks (i.e., selected to be replaced) in the LLC, and the eviction of such victim blocks is performed from the FRC. This approach improves performance due to three main reasons: i) the lifetime of blocks being replaced is enlarged, ii) the main memory path is unclogged on long bursts of LLC misses, and iii) the average LLC miss latency is reduced. The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches. Moreover, this is achieved with reduced energy consumption and with much less area requirements. Experimental results show that the proposed FRC cache scales in performance with the number of GPU compute units and the LLC size, since, depending on the FRC size, performance improves ranging from 30 to 67 percent for a modern baseline GPU card, and from 32 to 118 percent for a larger GPU. In addition, energy consumption is reduced on average from 49 to 57 percent for the larger GPU. These benefits come with a small area increase (by 7.3 percent) over the LLC baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Computers	Publication Date: Oct 1, 2019
Citations: 26	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Similar Papers

Access Pattern Based Re-reference Interval Table for Last Level Cache
Baozhong Yu ... Tianzhou Chen
-
Baozhong Yu, et. al.Baozhong Yu ... Tianzhou Chen
01 Oct 2011
01 Oct 2011

Efficient Cache Resizing policy for DRAM-based LLCs in ChipMultiprocessors
Bindu Agarwalla ... Nilkanta Sahu
Journal of Systems Architecture | VOL. 113
Bindu Agarwalla, et. al.Bindu Agarwalla ... Nilkanta Sahu
17 Sep 2020
Journal of Systems Architecture | VOL. 113

Process variation aware DRAM-Cache resizing
Bindu Agarwalla ... Shirshendu Das
Journal of Systems Architecture | VOL. 123
Bindu Agarwalla, et. al.Bindu Agarwalla ... Shirshendu Das
01 Feb 2022
Journal of Systems Architecture | VOL. 123

STEM: Spatiotemporal Management of Capacity for Intra-core Last Level Caches
Dongyuan Zhan ... Hong Jiang
-
Dongyuan Zhan, et. al.Dongyuan Zhan ... Hong Jiang
01 Dec 2010
01 Dec 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers