Abstract

The emergence of heterogeneous processors such as GPUs provide massively parallel computing power but also exacerbate the difficulties of parallel programming. Although low-level programming methods such as CUDA and OpenCL can yield good performance, the programming productivity is poor and applications lack portability. In this paper, we present a core language Ruler, which extends C with high-level parallel constructs. These constructs enable programmers to express parallelism in programs without concerning runtime details, thus ease user programming. We present the operational semantics of the language and show how these constructs reserve parallel patterns and parallelism degree of high-level applications. Those information could inform the compiler to generate efficient code and maintain the performance on different platforms. We have implemented a compiler and runtime system for Ruler on the top of OpenCL. Multiple benchmarks are rebuilt with Ruler and evaluated on both a NVIDIA GPU and an Intel MIC platform to demonstrate the effectiveness of our techniques. The size of Ruler code is only 13%-64% to that of the OpenCL code. The rebuilt benchmarks execute smoothly on both platforms after compilation, yielding a competitive performance to that of handcrafted benchmark OpenCL code on both platforms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call