Abstract

This paper presents an OpenCL-like offload programming framework for NEC SX-Aurora TSUBASA (SX-Aurora) and also discusses the benefit of employing metaprogramming to describe architecture-specific parts of the programs. Unlike traditional vector systems, one node of an SX-Aurora system consists of a host processor and some vector processors on PCI-Express cards, which are called a vector host and vector engines, respectively. Since the standard OpenCL execution model does not naturally fit in the vector engine, this paper discusses how to adapt the OpenCL specification to SX-Aurora while considering the trade off between performance and code portability. This paper employs OpenCL to minimize non-portable parts of an application code for offload programming, and then metaprogramming to describe the non-portable parts. Performance evaluation results clearly demonstrate that, with a moderate programming effort, the proposed framework can express the collaboration between a vector host and a vector engine so as to make a good use of both of the two different processors. By delegating the right task to the right processor, an OpenCL-like program can fully exploit the performance of SX-Aurora. Moreover, metaprogramming can express vectorization-aware performance optimization to enhance the performance portability across different architectures including SX-Aurora.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call