A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Juan Fang,Zelin Wei,Mengxuan Wang

doi:10.1007/s11227-019-03135-7

Juan Fang, Zelin Wei + Show 1 more

Open Access

https://doi.org/10.1007/s11227-019-03135-7

Copy DOI

Journal: The Journal of Supercomputing	Publication Date: Jan 10, 2020
Citations: 14	License type: open-access

Affiliation: Beijing University of Technology

Abstract

Multiple CPUs and GPUs are integrated on the same chip to share memory, and access requests between cores are interfering with each other. Memory requests from the GPU seriously interfere with the CPU memory access performance. Requests between multiple CPUs are intertwined when accessing memory, and its performance is greatly affected. The difference in access latency between GPU cores increases the average latency of memory accesses. In order to solve the problems encountered in the shared memory of heterogeneous multi-core systems, we propose a step-by-step memory scheduling strategy, which improve the system performance. The step-by-step memory scheduling strategy first creates a new memory request queue based on the request source and isolates the CPU requests from the GPU requests when the memory controller receives the memory request, thereby preventing the GPU request from interfering with the CPU request. Then, for the CPU request queue, a dynamic bank partitioning strategy is implemented, which dynamically maps it to different bank sets according to different memory characteristics of the application, and eliminates memory request interference of multiple CPU applications without affecting bank-level parallelism. Finally, for the GPU request queue, the criticality is introduced to measure the difference of the memory access latency between the cores. Based on the first ready-first come first served strategy, we implemented criticality-aware memory scheduling to balance the locality and criticality of application access.

Highlights

Increasing computing demand has caused more and more attention to heterogeneous computing in recent years
In the CPU + GPU heterogeneous system built by the gem5-gpu [11], we evaluated the memory access scheduling strategy and experimental results showing that the step-by-step memory scheduling strategy improves system performance
In the heterogeneous multi-core system built by gem5-gpu, the default memory access scheduling policy determines the priority based on the row buffer hit ratio, which seriously affects the memory access of the CPU application

Summary

Introduction

Increasing computing demand has caused more and more attention to heterogeneous computing in recent years. For the interference between memory access and different latency tolerance of the GPU core, we propose a step-by-step memory access strategy. We limited different classes of application access to different bank to eliminate interference from memory requests when multiple applications are executing in parallel; (3) different latency tolerances between GPU cores. The step-by-step memory access strategy can reduce the interference of GPU access requests on CPU access requests and bank conflicts, and at the same time improve the bank-level parallelism, and memory access latency differences between GPU cores. This paper introduced the challenges encountered by the memory access scheduling in heterogeneous multi-core systems due to the introduction of GPU. A large number of CPU memory requests limit the visibility of existing scheduling algorithms to CPU application access behavior and the difference in memory access latency among GPU cores.

Memory organization

CPU and GPU memory access behavior analysis

Memory access latency differences between GPU cores

Motivation

Memory partitioning

Multi‐core memory scheduling

GPU memory scheduling

Heterogeneous multi‐core memory scheduling

GPU interference with CPU memory requests

Memory access request interference between CPU cores

Latency difference between GPU cores

Dynamic bank partitioning strategy

Dynamic bank partition strategy design

Monitoring application access behavior

Application grouping

Bank partition rules

Workload

Core criticality

Criticality request percentage

Balanced access locality and criticality

Algorithm design

Experimental environment

Individual CPU performance

Individual GPU performance

Combined CPU‐GPU performance

Hardware overhead

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Similar Papers

System-Level Performance and Power Optimization for MPSoC
Ye-Jyun Lin ... Tay-Jyi Lin
ACM Transactions in Embedded Computing Systems | VOL. 14
Ye-Jyun Lin, et. al.Ye-Jyun Lin ... Tay-Jyi Lin
21 Jan 2015
ACM Transactions in Embedded Computing Systems | VOL. 14

BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling
Lavanya Subramanian ... Donghyuk Lee
IEEE Transactions on Parallel and Distributed Systems | VOL. 27
Lavanya Subramanian, et. al.Lavanya Subramanian ... Donghyuk Lee
01 Oct 2016
IEEE Transactions on Parallel and Distributed Systems | VOL. 27

DASH
Hiroyuki Usui ... Lavanya Subramanian
ACM Transactions on Architecture and Code Optimization | VOL. 12
Hiroyuki Usui, et. al.Hiroyuki Usui ... Lavanya Subramanian
04 Jan 2016
ACM Transactions on Architecture and Code Optimization | VOL. 12

Analyzing Fixed Task Priority Based Memory Centric Scheduler for the 3-Phase Task Model
Jatin Arora ... Eduardo Tovar
-
Jatin Arora, et. al.Jatin Arora ... Eduardo Tovar
01 Aug 2022
01 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Supercomputing