Overlapping communication and computation of GPU/CPU heterogeneous parallel spatial domain decomposition MOC method

Liang Liang,Qian Zhang,Liangzhi Cao,Peitao Song,Zhijian Zhang,Qiang Zhao,Hongchun Wu

doi:10.1016/j.anucene.2019.106988

Liang Liang, Qian Zhang + Show 5 more

https://doi.org/10.1016/j.anucene.2019.106988

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

GPU/CPU heterogeneous parallel calculation has been widely used and great performance has been achieved. Recently it also has been introduced into whole-core transport calculation. Considering the ability to solve the large-scale problem, the spatial domain decomposition (SDD) technique has been applied in the whole-core transport application. However, the communication burden becomes an important bottleneck in heterogeneous parallel SDD calculation and the parallel efficiency decreases dramatically when the number of the calculation domains increasing. GPU brings great acceleration meanwhile the communication time doesn’t change, which makes the parallel efficiency of the heterogeneous parallel calculation worse. In this work, heterogeneous parallel SDD calculation is performed by applying both the spatial domain decomposition and ray parallelization and the parallel performance is analyzed. In addition, a new scheme of the communication is implemented to overlap the MPI communication time and the neutron transport sweep on GPU. In order to accomplish this scheme, both MPI, CUDA protocols are introduced. Numerical results show that the new scheme successfully hide the communication time and the data copy time between GPU and CPU, which brings significant improvement of the parallel efficiency. With the new communication scheme, the large-scale whole-core transport calculation is more practicable.

Full Text