Abstract

In this paper we describe how to support atomics across multiple devices in heterogeneous processors. Specifically, this paper provides an overview of how OpenCL 2.0 and Heterogeneous System Architecture (HSA) atomics are supported on integrated CPU-GPU processors called Accelerated Processing Units (APUs). Recently, the C11 and C++11 standards have introduced atomics and an associated memory model for supporting scalable parallel programming with memory consistency semantics. OpenCL 2.0 revision has extended these atomics for multiple devices each one of which can be a CPU or a GPU. The HSA Foundation in the HSA intermediate language (HSAIL) standard has also included support for various atomic operations that span multiple devices. All of these paradigms enable parallel threads running simultaneously on the CPU and GPU cores to synchronize using atomics that were not possible earlier. In APUs, the CPU and GPU cores are on the same die and can access a unified memory. Hence, such a platform provides an excellent opportunity for showcasing the power of OpenCL 2.0/HSA atomics across devices (henceforth referred to as cross-device atomics). In this work we show how we have added capabilities in our LLVM-based OpenCL compiler and a JIT-like finalizer to support cross-device atomics for APUs. Also, by supporting the new HSAIL atomic virtual operations in our finalizer, we have enabled the capability whereby other high-level languages which translate to HSAIL can support cross-device atomics as part of their evolving language standard. Our compiler is one of the first to support such cross-device atomics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call