AbstractThe material point method (MPM) is computationally costly and highly parallelisable. With the plateauing of Moore’s law and recent advances in parallel computing, scientists without formal programming training might face challenges in developing fast scientific codes for their research. Parallel programming is intrinsically different to serial programming and may seem daunting to certain scientists, in particular for GPUs. However, recent developments in GPU application programming interfaces (APIs) have made it easier than ever to port codes to GPU. This paper explains how we ported our modular C++ MPM code to GPU without using low-level hardware APIs like CUDA or OpenCL. We aimed to develop a code that has abstracted parallelism and is therefore hardware agnostic. We first present an investigation of a variety of GPU APIs, comparing ease of use, hardware support and performance in an MPM context. Then, the porting process of to the Kokkos ecosystem is detailed, discussing key design patterns and challenges. Finally, our parallel C++ code running on GPU is shown to be up to 85 times faster than on CPU. Since Kokkos also supports Python and Fortran, the principles presented therein can also be applied to codes written in these languages.
Read full abstract