Abstract

GPU/CPU heterogeneous parallel calculation has been widely used and great performance has been achieved. Recently it also has been introduced into whole-core transport calculation. Considering the ability to solve the large-scale problem, the spatial domain decomposition (SDD) technique has been applied in the whole-core transport application. However, the communication burden becomes an important bottleneck in heterogeneous parallel SDD calculation and the parallel efficiency decreases dramatically when the number of the calculation domains increasing. GPU brings great acceleration meanwhile the communication time doesn’t change, which makes the parallel efficiency of the heterogeneous parallel calculation worse. In this work, heterogeneous parallel SDD calculation is performed by applying both the spatial domain decomposition and ray parallelization and the parallel performance is analyzed. In addition, a new scheme of the communication is implemented to overlap the MPI communication time and the neutron transport sweep on GPU. In order to accomplish this scheme, both MPI, CUDA protocols are introduced. Numerical results show that the new scheme successfully hide the communication time and the data copy time between GPU and CPU, which brings significant improvement of the parallel efficiency. With the new communication scheme, the large-scale whole-core transport calculation is more practicable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call