Abstract

In previous works (Jahanbakhsh et al., CMAME 298 (2016): 80–107, Jahanbakhsh et al., CMAME 317 (2017): 102–127; see [1] and [2]), the authors introduced SPHEROS, a 3-D particle-based solver based on the Finite Volume Particle Method (FVPM) featuring a spherical top-hat kernel. In the present research, the authors present algorithms and optimization procedures that allowed to significantly accelerate computations by taking advantage of the computational power of Graphics Processing Units (GPUs). The new accelerated solver, GPU-SPHEROS, has been developed in CUDA and runs entirely on GPU, are presented. All the parallel algorithms and data structures have been designed specifically for the GPU many-core architecture. A roofline model has been utilized to assess the performance of the kernels and apply appropriate optimization strategies. In particular, the neighbor search algorithm, accounting for almost a third of the overall compute time, features an efficient Space-Filling Curve (SFC) as well as an optimized octree construction procedure. The memory-bound interaction vector computation, accounting for almost two thirds of the overall compute time, features fixed-size memory pre-allocation and an efficient data ordering to reduce memory transactions and cost of dynamic memory operations i.e. allocation and deallocation. As a case study, the numerical simulation results of water jet deviation by the rotating buckets in a Pelton turbine is presented and compared to available experimental data. For that case, a speedup by a factor of almost six times has been achieved on a single NVIDIA® Tesla™ P100-SXM2-16 GB GPU with GP100 Pascal architecture compared to a dual CPU node equipped with two Broadwell Intel® Xeon® E5-2690 v4 CPUs with 28 total physical cores.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call