Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures

Krzysztof Rojek,Lukasz Kuczynski,Roman Wyrzykowski

doi:10.1002/cpe.3970

Abstract

SummaryIn this work, we focus on a systematic adaptation of the stencil‐based multidimensional positive definite advection transport algorithm (MPDATA) to different graphics processing unit (GPU)‐based computing platforms. Another objective of this work is to compare the performance of MPDATA on several platforms, including a multi‐GPU system with two NVIDIA Tesla K80 cards, and single‐card platforms with Tesla K20X, GeForce GTX TITAN, and GeForce GTX 980. The usage of the following optimization methods is proposed to improve the overall performance: (i) reducing the number of operations by the subexpression elimination when implementing 2.5D blocking; (ii) reorganization of boundary conditions for reducing branch instructions; (iii) advanced memory management to increase the coalesced memory access; and (iv) warps rearrangement for optimizing the data access to GPU global memory. The presented methods of the MPDATA adaptation to GPU architectures allow us to efficiently use many graphics processors within a single node by applying peer‐to‐peer data transfers between GPU global memories. We propose an auto‐tuning procedure to compensate architectural differences between the considered platforms. This procedure takes into account algorithm/GPU‐specific parameters. The proposed approach to adaptation of MPDATA to GPU architectures allows us to achieve up to 482.5 Gflop/s for the platform equipped with two NVIDIA K80 GPUs. Copyright © 2016 John Wiley & Sons, Ltd.

Full Text