Abstract

In this work, we present our implementation of a three-dimensional 5th order finite-difference weighted essentially non-oscillatory (WENO) scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution structures. In the level of MPI parallelization, we subdivided the domain along each of three axial directions. Then on each process, we ported the WENO computation to GPU. This method is memory-bound derived from the calculations of the weights and it becomes a greater challenge for a 3D high order problem in double precision. To make full use of impressive computing power of GPU and avoid its memory limitation, we performed a series of optimizations that are focused on memory accessing mode at all levels. We subjected this code to a number of typical tests for the evaluation of effectiveness and efficiency. Our tests indicate that, in a mono-thread Fortran code reference, the GPU version achieves a 12~19 speed-up and about 19~36 in the computation part. We analyzed the results on both Fermi and Kepler GPUs. We also outlined what is needed to further increase the speed by reducing the time spent on the communications part and other future work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call