Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence

Xiaowei Ren,Mieszko Lis

doi:10.1109/hpca.2017.40

Abstract

Recent work has argued that sequential consistency (SC) in GPUs can perform on par with weak memory models, provided ordering stalls are made less frequent by relaxing ordering for private and read-only data. In this paper, we address the complementary problem of reducing stall latencies for both read-only and read-write data. We find that SC stalls are particularly problematic for workloads with inter-workgroup sharing, and occur primarily due to earlier stores in the same thread, a substantial part of the overhead comes from the need to stall until write permissions are obtained (to ensure write atomicity). To address this, we propose RCC, a GPU coherence protocol which grants write permissions without stalling but can still be used to implement SC. RCC uses logical timestamps to determine a global memory order and L1 read permissions, even though each core may see a different logical time, SC ordering can still be maintained. Unlike previous GPU SC proposals, our design does not require invasive core changes and additional per-core storage to classify read-only/private data. For workloads with inter-workgroup sharing overall performance is 29% better and energy is 25% less than in best previous GPU SC proposals, and within 7% of the best non-SC design.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Scheduling constraint based abstraction refinement for weak memory models
Liangze Yin ... Wanwei Liu
-
Liangze Yin, et. al.Liangze Yin ... Wanwei Liu
03 Sep 2018
03 Sep 2018

AutoMO: automatic inference of memory order parameters for C/C++11
Peizhao Ou ... Brian Demsky
ACM SIGPLAN Notices | VOL. 50
Peizhao Ou, et. al.Peizhao Ou ... Brian Demsky
23 Oct 2015
ACM SIGPLAN Notices | VOL. 50

AutoMO: automatic inference of memory order parameters for C/C++11
Peizhao Ou ... Brian Demsky
-
Peizhao Ou, et. al.Peizhao Ou ... Brian Demsky
23 Oct 2015
23 Oct 2015

Verification of Concurrent Programs on Weak Memory Models
Oleg Travkin ... Heike Wehrheim
-
Oleg Travkin, et. al.Oleg Travkin ... Heike Wehrheim
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence

Abstract

Talk to us

Similar Papers