Abstract

As part of a project aimed at exploring the use of next-generation high-performance computing technologies for numerical weather prediction, we have ported two physics modules from the Common Community Physics Package (CCPP) to Graphics Processing Unit (GPU) and obtained accelerations of up to 10× relative to a comparable multi-core CPU. The physics parameterizations accelerated in this work are the aerosol-aware Thompson microphysics (TH) scheme and the Grell–Freitas (GF) cumulus convection scheme. Microphysics schemes are among the most time-consuming physics parameterizations, second to only radiative process schemes, and our results show better acceleration for the TH scheme than the GF scheme. Multi-GPU implementations of the schemes show acceptable weak scaling in a single node with 8 GPUs, and perfect weak scaling on multiple nodes using one GPU per node. The lack of inter-node communication for column physics parameterizations contributes to their scalability, however, physics parameterizations are run along with dynamics, so the overall multi-GPU performance is often governed by the latter. In the context of optimizing CCPP physics modules, our observations underscore that the extensive use of automatic arrays within inner subroutines hampers GPU performance due to serialized memory allocations. We have used the OpenACC directive programming language for this work because it allows for easy porting of large amounts of code and makes code maintenance more manageable compared to low-level languages like CUDA and OpenCL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call