Abstract

As more emerging applications are moving to GPUs, fine-grained synchronization has become imperative. However, their performance can be severely impaired in case of frequent synchronization failures caused by high data contention. Differently from CPUs, GPUs own thousands of hardware threads and adopt single instruction multiple threads paradigm, making it impractical to deploy the CPU contention management mechanisms directly on GPUs. In this article, we design a Software Warp Controlling Framework (SWCF), which employs producer-consumer execution model and leverages GPU hardware barriers to dynamically control the execution of warps at runtime. On the basis of SWCF, we propose a contention management strategy to decrease frequent synchronization failures while avoiding the over-reducing of parallelism. We evaluate SWCF and the proposed strategy on commodity GPUs using a set of applications with fine-grained synchronization. The results show that on V100 GPU our contention management achieves a 4.7X speedup and outperforms the conventional GPU software backoff solution by 42% on average.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call