Abstract

Heterogeneous computing has been used widely since accelerators like Graphic Processing Unit (GPU) and Intel Many Integrated Core (MIC) can offer an order of magnitude higher compute power for arithmetic intensive data-parallel workloads. However, heterogeneous programming is more complicated since there is no shared memory between CPU and MIC. Programmers must distinguish the local or remote access of data and transmit the data between CPU and MIC explicitly. Furthermore, standard offload programming models like Intel Language Extensions for Offload (LEO) is restricted to a single compute node and hence a limited number of coprocessors also complicate the efficient use of MIC for heterogeneous computations. In this paper, we propose CoGA, the extension of Global Array (GA) for heterogeneous systems consist of CPU and MIC. Our implementation of CoGA is on top of Symmetric Communication Interface (SCIF), a sockets-like API for communication between processes on MIC and host within the same system. CoGA can provide a shared memory abstraction between CPU and MIC and simplifies the programming by allowing programmers to access the shared data regardless where the referenced data is located. Our evaluation on data transmission bandwidth and Sparse-Matrix Vector multiplication problem proves that CoGA is practical and effective.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call