A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads

Sohan Lal,Bogaraju Sharatchandra Varma,Ben Juurlink

doi:10.1007/s10766-022-00729-2

Abstract

GPUs are capable of delivering peak performance in TFLOPs, however, peak performance is often difficult to achieve due to several performance bottlenecks. Memory divergence is one such performance bottleneck that makes it harder to exploit locality, cause cache thrashing, and high miss rate, therefore, impeding GPU performance. As data locality is crucial for performance, there have been several efforts to exploit data locality in GPUs. However, there is a lack of quantitative analysis of data locality, which could pave the way for optimizations. In this paper, we quantitatively study the data locality and its limits in GPUs at different granularities. We show that, in contrast to previous studies, there is a significantly higher inter-warp locality at the L1 data cache for memory-divergent workloads. We further show that about 50% of the cache capacity and other scarce resources such as NoC bandwidth are wasted due to data over-fetch caused by memory divergence. While the low spatial utilization of cache lines justifies the sectored-cache design to only fetch those sectors of a cache line that are needed during a request, our limit study reveals the lost spatial locality for which additional memory requests are needed to fetch the other sectors of the same cache line. The lost spatial locality presents opportunities for further optimizing the cache design.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Parallel Programming	Publication Date: Apr 1, 2022
Citations: 2	License type: open-access

R Discovery Prime

R Discovery Prime

A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads

Abstract

Talk to us

Similar Papers

More From: International Journal of Parallel Programming

Lead the way for us

Similar Papers

ID-cache: instruction and memory divergence based cache management for GPUs
Akhil Arunkumar ... Shin-Ying Lee
-
Akhil Arunkumar, et. al.Akhil Arunkumar ... Shin-Ying Lee
01 Sep 2016
01 Sep 2016

Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection
Xian Zhu ... Robert Wernsman
-
Xian Zhu, et. al.Xian Zhu ... Robert Wernsman
13 Aug 2018
13 Aug 2018

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Rachata Ausavarungnirun ... Onur Kayiran
-
Rachata Ausavarungnirun, et. al.Rachata Ausavarungnirun ... Onur Kayiran
01 Oct 2015
01 Oct 2015

Divergence-aware warp scheduling
Timothy G Rogers ... Mike O'Connor
-
Timothy G Rogers, et. al.Timothy G Rogers ... Mike O'Connor
07 Dec 2013
07 Dec 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads

Abstract

Talk to us

Similar Papers

More From: International Journal of Parallel Programming