Abstract

Thread communication and synchronization play an important role in parallel computing. On graphics processing units (GPUs) where thousands of threads run simultaneously, the performance of the processor largely depends on the efficiency of thread communication and synchronization. Understanding which mechanisms are supported on modern GPUs and their implication for algorithm design is also very important in order for GPU programmers to write efficient code. Since most conventional general-purpose computing on GPU workloads are massively parallel with little cooperation among threads, early GPUs supported only coarse-grained thread communication and synchronization. However, the current trend is to accelerate more diverse workloads, and coarse-grained mechanisms have become a major limiting factor in exploiting parallelism. The latest industry standard programming framework, OpenCL 2.0, introduces fine-grained thread communication and synchronization support to address this issue. In this chapter, we present both coarse-grained and fine-grained thread synchronization and communication mechanisms available on modern GPUs along with the impact of their performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call