PACC: a directive-based programming framework for out-of-core stencil computation on accelerators

Nobuhiro Miki,Kenichi Hagihara,Fumihiko Ino

doi:10.1504/ijhpcn.2019.097046

Nobuhiro Miki, Kenichi Hagihara + Show 1 more

Open Access

https://doi.org/10.1504/ijhpcn.2019.097046

Copy DOI

Abstract

We present a directive-based programming framework, i.e., the pipelined accelerator (PACC), to accelerate large-scale stencil computation on an accelerator device, such as a graphics processing unit (GPU). PACC provides a collection of extended OpenACC directives to facilitate out-of-core stencil computation accelerated using temporal blocking. The proposed framework includes a source-to-source translator capable of generating an out-of-core OpenACC code from the PACC code, i.e., large data is automatically decomposed into smaller chunks that are processed using limited capacity device memory. The generated code is optimised using a temporal blocking technique to minimise CPU-GPU data transfer. Furthermore, the code is accelerated using a multithreaded pipeline engine that maximises data copy throughput and overlaps GPU execution and data transfer. In experiments, we applied the proposed translator to three stencil computation codes. The out-of-core performance for 107 GB data on an NVIDIA Tesla K40 GPU with 12 GB memory reached 69.3 GFLOPS, which is 17% less than the in-core performance for 8 GB data. We believe that the proposed directive-based approach can be used to facilitate out-of-core stencil computation on a GPU.

Full Text