Large-Scale Parallelization Based on CPU and GPU Cluster for Cosmological Fluid Simulations

Chen Meng,Long-Long Feng,Long Wang,Weishan Zhu,Zongyan Cao

doi:10.1007/978-3-642-53962-6_18

Chen Meng, Long-Long Feng + Show 3 more

https://doi.org/10.1007/978-3-642-53962-6_18

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

AbstractIn this study, we present our parallel implementation for large-scale cosmological simulations of 3D supersonic fluids based on CPU and GPU clusters. Our developments are based on an OpenMP parallelized CPU code named WIGEON. It is shown that a speedup of 13~31 (depending on the specific GPU card) can be achieved compared to the sequential Fortran code by using the GPU as the accelerator. Further more, our results show that the pure MPI parallelization scales very well up to ten thousand CPU cores. In addition, a hybrid CPU/GPU parallelization scheme is introduced and a detailed analysis of the speedup and the scaling on the different number of CPU and GPU cards are presented (up to 256 GPU cards due to computing resource limitation). The efficiency of our scaling and high speedup relies on domain decomposition approach, optimization of the WENO algorithm and a series of techniques to optimize the CUDA implementation, especially in the memory access pattern. We believe this hybrid MPI+CUDA code can be an excellent candidate for 10 Peta-scale computing and beyond.KeywordsCosmological hydrodynamicsheterogeneousWENOGPUlarge-scale cluster

Full Text