Exploiting Core Criticality for Enhanced GPU Performance

Adwait Jog,Chita R Das,Onur Kayiran,Ravishankar Iyer,Onur Mutlu,Mahmut T Kandemir,Ashutosh Pattnaik

doi:10.1145/2964791.2901468

Abstract

Modern memory access schedulers employed in GPUs typically optimize for memory throughput. They implicitly assume that all requests from different cores are equally important. However, we show that during the execution of a subset of CUDA applications, different cores can have different amounts of tolerance to latency. In particular, cores with a larger fraction of warps waiting for data to come back from DRAM are less likely to tolerate the latency of an outstanding memory request. Requests from such cores are more critical than requests from others. Based on this observation, this paper introduces a new memory scheduler, called (C)ritica(L)ity (A)ware (M)emory (S)cheduler (CLAMS), which takes into account the latency-tolerance of the cores that generate memory requests. The key idea is to use the fraction of critical requests in the memory request buffer to switch between scheduling policies optimized for criticality and locality. If this fraction is below a threshold, CLAMS prioritizes critical requests to ensure cores that cannot tolerate latency are serviced faster. Otherwise, CLAMS optimizes for locality, anticipating that there are too many critical requests and prioritizing one over another would not significantly benefit performance. We first present a core-criticality estimation mechanism for determining critical cores and requests, and then discuss issues related to finding a balance between criticality and locality in the memory scheduler. We progressively devise three variants of CLAMS, and show that the Dynamic CLAMS provides significantly higher performance, across a variety of workloads, than the commonly-employed GPU memory schedulers optimized solely for locality. The results indicate that a GPU memory system that considers both core criticality and DRAM access locality can provide significant improvement in performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Core Criticality for Enhanced GPU Performance

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review

Lead the way for us

Journal: ACM SIGMETRICS Performance Evaluation Review	Publication Date: Jun 14, 2016
Citations: 5

Similar Papers

Exploiting Core Criticality for Enhanced GPU Performance
Adwait Jog ... Onur Mutlu
-
Adwait Jog, et. al.Adwait Jog ... Onur Mutlu
14 Jun 2016
14 Jun 2016

Analyzing Fixed Task Priority Based Memory Centric Scheduler for the 3-Phase Task Model
Jatin Arora ... Claudio Maia
-
Jatin Arora, et. al.Jatin Arora ... Claudio Maia
01 Aug 2022
01 Aug 2022

Reducing memory interference in multicore systems via application-aware memory channel partitioning
Sai Prashanth Muralidhara ... Lavanya Subramanian
-
Sai Prashanth Muralidhara, et. al.Sai Prashanth Muralidhara ... Lavanya Subramanian
03 Dec 2011
03 Dec 2011

Parallel application memory scheduling
Eiman Ebrahimi ... Chang Joo Lee
-
Eiman Ebrahimi, et. al.Eiman Ebrahimi ... Chang Joo Lee
03 Dec 2011
03 Dec 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Core Criticality for Enhanced GPU Performance

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review