Abstract
AbstractIn this study, we present our parallel implementation for large-scale cosmological simulations of 3D supersonic fluids based on CPU and GPU clusters. Our developments are based on an OpenMP parallelized CPU code named WIGEON. It is shown that a speedup of 13~31 (depending on the specific GPU card) can be achieved compared to the sequential Fortran code by using the GPU as the accelerator. Further more, our results show that the pure MPI parallelization scales very well up to ten thousand CPU cores. In addition, a hybrid CPU/GPU parallelization scheme is introduced and a detailed analysis of the speedup and the scaling on the different number of CPU and GPU cards are presented (up to 256 GPU cards due to computing resource limitation). The efficiency of our scaling and high speedup relies on domain decomposition approach, optimization of the WENO algorithm and a series of techniques to optimize the CUDA implementation, especially in the memory access pattern. We believe this hybrid MPI+CUDA code can be an excellent candidate for 10 Peta-scale computing and beyond.KeywordsCosmological hydrodynamicsheterogeneousWENOGPUlarge-scale cluster
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have