Thread Batching for High-performance Energy-efficient GPU Memory Design

Bing Li,Tao Liu,Yiran Chen,Mengjie Mao,Wujie Wen,Zihao Liu,Xiaoxiao Liu,Hai (Helen) Li

doi:10.1145/3330152

Abstract

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, memory becomes a bottleneck of GPU’s performance and energy efficiency. In this article, we propose an integrated architectural scheme to optimize the memory accesses and therefore boost the performance and energy efficiency of GPU. First, we propose a thread batch enabled memory partitioning (TEMP) to improve GPU memory access parallelism. In particular, TEMP groups multiple thread blocks that share the same set of pages into a thread batch and applies a page coloring mechanism to bound each stream multiprocessor (SM) to the dedicated memory banks. After that, TEMP dispatches the thread batch to an SM to ensure high-parallel memory-access streaming from the different thread blocks. Second, a thread batch-aware scheduling (TBAS) scheme is introduced to improve the GPU memory access locality and to reduce the contention on memory controllers and interconnection networks. Experimental results show that the integration of TEMP and TBAS can achieve up to 10.3% performance improvement and 11.3% DRAM energy reduction across diverse GPU applications. We also evaluate the performance interference of the mixed CPU+GPU workloads when they are run on a heterogeneous system that employs our proposed schemes. Our results show that a simple solution can effectively ensure the efficient execution of both GPU and CPU applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Thread Batching for High-performance Energy-efficient GPU Memory Design

Abstract

Talk to us

Similar Papers

More From: ACM Journal on Emerging Technologies in Computing Systems

Lead the way for us

Journal: ACM Journal on Emerging Technologies in Computing Systems	Publication Date: Oct 31, 2019
Citations: 1

Similar Papers

Design space exploration of on-chip ring interconnection for a CPU–GPU heterogeneous architecture
Jaekyu Lee ... Sudhakar Yalamanchili
Journal of Parallel and Distributed Computing | VOL. 73
Jaekyu Lee, et. al.Jaekyu Lee ... Sudhakar Yalamanchili
14 Aug 2013
Journal of Parallel and Distributed Computing | VOL. 73

Managing GPU Concurrency in Heterogeneous Architectures
Onur Kayiran ... Rachata Ausavarungnirun
-
Onur Kayiran, et. al.Onur Kayiran ... Rachata Ausavarungnirun
01 Dec 2014
01 Dec 2014

Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol
Juan Fang ... Xiaoting Hao
-
Juan Fang, et. al.Juan Fang ... Xiaoting Hao
01 Jan 2017
01 Jan 2017

Multidimensional Grid Aware Address Prediction for GPGPU
Shivani Tripathy ... Debiprasanna Sahoo
-
Shivani Tripathy, et. al.Shivani Tripathy ... Debiprasanna Sahoo
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Thread Batching for High-performance Energy-efficient GPU Memory Design

Abstract

Talk to us

Similar Papers

More From: ACM Journal on Emerging Technologies in Computing Systems