GPU acceleration of the WRF Purdue Lin cloud microphysics scheme

Jun Wang,Mitchell D Goldberg,Hung-Lung A Huang,Bormin Huang

doi:10.1117/12.901825

Abstract

The Weather Research and Forecasting (WRF) model is a numerical weather prediction and atmospheric simulation system. It has been designed for both research and operational applications. WRF code can be run in different computing environments ranging from laptops to supercomputers. Purdue Lin scheme is a relatively sophisticated microphysics scheme in WRF. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. In this paper, we accelerate the Purdue Lin scheme on the multi-core NVIDIA Graphics Processing Units (GPUs). Lately, GPUs have evolved into highly parallel, multi-threaded, many-core processors possessing tremendous computational speed and a high memory bandwidth. We discuss how our GPU implementation exploits the massive parallelism, resulting in a highly efficient acceleration of the Purdue Lin scheme. We utilize a low-cost personal supercomputer with 512 CUDA cores on a GTX590 GPU. We achieve an overall speedup of 156× in case of 1 GPU as compared to the single-threaded CPU version. Since Purdue Lin microphysics scheme is only an intermediate module of the entire WRF model, host-device I/O should not happen, i.e. its input data is already available in the GPU global memory from previous modules and its output data should reside in the GPU global memory for later usage by other modules. The speedup without host-device data transfer time is 692×.

Full Text