Abstract

OpenACC is a directive-based programing standard aim to provide a highly portable programming model for massively-parallel accelerators, such as General-purpose Computing on Graphics Processing Units (GPGPU), Accelerated Processing Unit (APU) and Many Integrated Core Architecture (MIC). The heterogeneous nature of these accelerators stresses a demand for careful planning of data movement and novel approaches of parallel algorithms not commonly involved in scientific computation. By following a similar concept of OpenMP, the directive-based approach of OpenACC hides many underlying implementation details, thus significantly reduces the programming complexity and increases code portability. However, many challenges remain, due to the relatively narrow interconnection bandwidth among GPUs and the very fine granularity of GPGPU architecture. The first is particularly restrictive when cross-node data exchange is involved in a cluster environment. Furthermore, GPGPU’s fine-grained parallelism is in conflict with certain types of inherently serial algorithms, posing further restrictions on performance. In our study, an implicit multi-block incompressible Navier-Stokes solver is ported for GPGPU using OpenACC and MVAPICH2. A performance analysis is carried out based on the profiling of this solver running in a InfiniBand cluster with nVidia GPUs, which helps to identify the potentials of directive-based GPU programming and directions for further improvement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call