OSM: Off-Chip Shared Memory for GPUs

Sina Darabi,Ehsan Yousefzadeh-Asl-Miandoab,Mohammad Sadrosadati,Hajar Falahati,Negar Akbarzadeh,Hamid Sarbazi-Azad,Pejman Lotfi-Kamran

doi:10.1109/tpds.2022.3154315

Abstract

Graphics Processing Units (GPUs) employ a shared memory, a software-managed cache for programmers, in each streaming multiprocessor to accelerate data sharing among the threads in a thread block. Although 60% of the shared memory space is underutilized, on average, there are some workloads that demand higher shared memory capacities. Therefore, improving shared memory utilization while satisfying the needs of shared memory intensive workloads is challenging. We make a key observation that the lifetime of each shared memory address is significantly shorter than the execution time of a thread block. In this paper, we first propose Off-Chip Shared Memory (OSM) that allocates shared memory space in the off-chip memory, and accelerates accesses to it via a small on-chip cache. Using an 8 KB cache for shared memory addresses, OSM provides almost the same performance as the baseline GPU that uses 96 KB on-chip shared memory. OSM improves GPU performance in two ways. First, it allocates higher shared memory capacities in the off-chip memory, and improves thread-level parallelism (TLP). Second, it designs a unified cache for shared memory and global address spaces, providing more caching space for global memory address space even for the workloads with high shared memory utilization. Our experimental results show an average 21% and 18% IPC improvement compared to the baseline and the state-of-the-art architectures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

OSM: Off-Chip Shared Memory for GPUs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Dec 1, 2022
Citations: 6

Similar Papers

Current Trends in Parallel Computation and the Implications for Modeling and Optimization

Computer Aided Chemical Engineering | VOL. 27

01 Jan 2009
Computer Aided Chemical Engineering | VOL. 27

Shared memory multiplexing
Yi Yang ... Huiyang Zhou
-
Yi Yang, et. al.Yi Yang ... Huiyang Zhou
19 Sep 2012
19 Sep 2012

Memory Optimized Dynamic Matrix Chain Multiplication Using Shared Memory in GPU
Girish Biswas ... Nandini Mukherjee
-
Girish Biswas, et. al.Girish Biswas ... Nandini Mukherjee
12 Dec 2020
12 Dec 2020

Efficient shared memory with minimal hardware support
Leonidas I Kontothanassis ... Michael L Scott
ACM SIGARCH Computer Architecture News | VOL. 23
Leonidas I Kontothanassis, et. al.Leonidas I Kontothanassis ... Michael L Scott
01 Sep 1995
ACM SIGARCH Computer Architecture News | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OSM: Off-Chip Shared Memory for GPUs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems