Abstract

The state-of-the-art parallel programming approaches OpenCL and CUDA require so-called host code for pro-gram's execution. Implementing host code is often a cumbersome task, especially when executing OpenCL and CUDA programs on systems with multiple devices, e.g., multi-core CPU and Graphics Processing Units (GPUs): the programmer is responsible for explicitly managing system's main memory and devices' memories, synchronizing computations with data transfers between main and/or devices' memories, and optimizing data transfers, e.g., by using pinned main memory for accelerating data transfers and overlapping the transfers with comnutations. In this paper, we present OCAL (DpenCL/CUDA Abstraction Layer) – a high-level approach to simplify the development of host code. OCAL combines five major advantages over the state-of-the-art high-level approaches: 1) it simplifies implementing both OpenCL and CUDA host code by providing a simple-to-use, uniform high-level host code abstraction API; 2) it supports executing arbitrary OpenCL and CUDA programs; 3) it simplifies implementing data-transfer optimizations by providing specially-optimized memory buffers, e.g., for conveniently using pinned main memory; 4) it optimizes memory management by automatically avoiding unnecessary data transfers; 5) it enables interoperability between OpenCL and CUDA host code by automatically managing the communication between OpenCL and CUDA data structures and by automatically translating between the OpenCL. and CUDA programming constructs. Our experiments demonstrate that OCAL significantly simplifies implementing host code with a low runtime overhead for abstraction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call