A Multi-Level Platform-Independent GPU API for High-Level Programming Models

Akihiro Hayashi,Vivek Sarkar,Sri Raj Paul

doi:10.1007/978-3-031-23220-6_7

Abstract

AbstractWhile there has been a growing interest in supporting accelerators, especially GPU accelerators, in large-scale systems, the user typically has to work with low-level GPU programming models such as CUDA along with the low-level message passing interface (MPI).We believe higher-level programming models such as Partitioned Global Address Space (PGAS) programming models enable productive parallel programming at both the intra-node and inter-node levels in homogeneous and heterogeneous nodes. However, GPU programming with PGAS languages in practice is still limited since there is still a big performance gap between compiler-generated GPU code and hand-tuned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Thus, it is not rare that the user eventually writes a fully external GPU program that includes the host part -i.e., GPU memory (de)allocation, host-device/device-host data transfer, and the device part - i.e., GPU kernels, and calls it from their primary language, which is not very productive.Our key observation is that the complexity of writing the external GPU program comes not only from writing GPU kernels in the device part, but also from writing the host part. In particular, interfacing objects in the primary language to raw C/C++ pointers is tedious and error-prone, especially because high-level languages usually have a well-defined type system with type inference.In this paper, we introduce the GPUAPI module, which offers multiple abstraction levels of low-level GPU API routines for high-level programming models with a special focus on PGAS languages, which allows the user to choose an appropriate abstraction level depending on their tuning scenarios. The module is also designed to work with multiple standard low-level GPU programming models: CUDA, HIP, DPC++, and SYCL, thereby significantly improving productivity and portability.We use Chapel as the primary example and our preliminary performance and productivity evaluations show that the use of the GPUAPI module significantly simplifies GPU programming in a high-level programming model like Chapel, while targeting different multi-node CPUs+GPUs platforms with no performance loss.KeywordsGPUsChapelPGAS languagesDistributed programming modelGPU API library

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Optimizing Collective Communication in UPC
Jithin Jose ... Dhabaleswar K Panda
-
Jithin Jose, et. al.Jithin Jose ... Dhabaleswar K Panda
01 May 2014
01 May 2014

Gasimo: a global address space simulation model
Worawan Marurngsith ... Roland N Ibbett
-
Worawan Marurngsith, et. al.Worawan Marurngsith ... Roland N Ibbett
01 Jan 2009
01 Jan 2009

Experiences with UPC on TILE-64 processor
Olivier Serres ... Tarek El-Ghazawi
-
Olivier Serres, et. al.Olivier Serres ... Tarek El-Ghazawi
01 Mar 2011
01 Mar 2011

Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP
Keisuke Tsugane ... Hitoshi Murai
Parallel Computing | VOL. 57
Keisuke Tsugane, et. al.Keisuke Tsugane ... Hitoshi Murai
01 Jun 2016
Parallel Computing | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

Abstract

Talk to us

Similar Papers