Abstract

As GPUs are being used for general purpose computations, applications with different memory access requirements have emerged. In spite of the growing demand, only few GPU coherence protocols and memory models have been explored in research, and even fewer models have been implemented in products. However, in the CPU domain a diverse range of memory models for parallel programming have been proposed, which explore the interplay between performance and programmability.Sequential consistency (SC) is one of the strict memory models. It provides the most programmer intuitive execution of memory operation but it imposes strict ordering restrictions on memory operations that cause performance overhead. Hence, implementing and supporting SC is one of the most challenging tasks in any computing platform, and GPUs are no exception. As such in this paper, we propose a GPU architecture that implements SC memory model with minimal performance and power overhead. We achieve this goal by designing a mechanism to detect races between different streaming multiprocessors (SMs) dynamically at runtime. The race is detected using a signature-based mechanism to keep track of sets of unseen updates for each SM which significantly reduces the hardware implementation cost, with a small increase in invalidation traffic. Our experiments show that dynamic race detection can be used to implement sequential consistency with 5% performance overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call