OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA

Hiroyuki Takizawa,Shinji Shiotsuki,Naoki Ebata,Ryusuke Egawa

doi:10.1016/j.parco.2021.102754

Hiroyuki Takizawa, Shinji Shiotsuki + Show 2 more

Open Access

https://doi.org/10.1016/j.parco.2021.102754

Copy DOI

Journal: Parallel Computing	Publication Date: Feb 10, 2021
Citations: 4	License type: cc-by

Affiliation: Tohoku University, Tokyo Denki University

Abstract

This paper presents an OpenCL-like offload programming framework for NEC SX-Aurora TSUBASA (SX-Aurora) and also discusses the benefit of employing metaprogramming to describe architecture-specific parts of the programs. Unlike traditional vector systems, one node of an SX-Aurora system consists of a host processor and some vector processors on PCI-Express cards, which are called a vector host and vector engines, respectively. Since the standard OpenCL execution model does not naturally fit in the vector engine, this paper discusses how to adapt the OpenCL specification to SX-Aurora while considering the trade off between performance and code portability. This paper employs OpenCL to minimize non-portable parts of an application code for offload programming, and then metaprogramming to describe the non-portable parts. Performance evaluation results clearly demonstrate that, with a moderate programming effort, the proposed framework can express the collaboration between a vector host and a vector engine so as to make a good use of both of the two different processors. By delegating the right task to the right processor, an OpenCL-like program can fully exploit the performance of SX-Aurora. Moreover, metaprogramming can express vectorization-aware performance optimization to enhance the performance portability across different architectures including SX-Aurora.

Full Text