A GPU-accelerated adaptive mesh refinement for immersed boundary methods

Hua Ji,Fue-Sang Lien,Fan Zhang

doi:10.1016/j.compfluid.2015.06.011

Abstract

A new patch-based adaptive mesh refinement (AMR) code is developed. The fully threaded tree (FTT) data structure (Khokhlov, 1998) originally developed for cell-based AMR is extended to organize the patch-based adaptive meshes. We demonstrate the accuracy of the AMR code by performing the simulations of the two and three-dimensional Sedov blast wave problems. We measure the performance of the code by solving the same problems in different grid resolutions on a single CPU, where detailed timing analyses provide comparison between the current AMR code and a conventional oct-tree-based AMR code (FLASH 4.0.1) (Fryxell et al., 2000). For the two-dimensional cases, maximum speed-up ratio of 58.23 is obtained versus FLASH 4.0.1 with 5122 effective grid resolution. For the three-dimensional cases, maximum speed-up ratios of 32.16 is achieved over FLASH 4.0.1 with 1283 effective grid resolution. We present the implementation and performance of a patch-based AMR algorithm with the immersed boundary method (IBM) on Graphic Processing Units (GPU). We also applied several optimizations in GPU computation, including the asynchronous memory copy, the concurrent execution between CPU and GPU and hybrid MPI/OpenMP/GPU parallelization. Performance benchmarks are conducted on the GPU cluster on SHARCNET (https://www.sharcnet.ca/) using 1–8 Tesla M2070 GPUs. Maximum speed-up factors of 22.3 and 20.5 are demonstrated using one GPU and 8 GPUs with 20483 effective resolution, respectively.

Full Text