Advanced Parallelism of DGTD Method With Local Time Stepping Based on Novel MPI + MPI Unified Parallel Algorithm

Zhen Guo Ban,Yan Shi,Peng Wang

doi:10.1109/tap.2021.3137455

Abstract

In this communication, a novel message passing interface (MPI) parallel algorithm for nodal discontinuous Galerkin time-domain (NDGTD) method has been developed. A unified MPI + MPI technique has been introduced for extreme parallelism on a large-scale computer cluster. Through the data transmission between CPU nodes using MPI persistent nonblocking two-side communication and the direct data connection between processors in the same node via MPI shared memory windows, a two-layered parallel architecture is implemented to minimize the communication. To further accelerate the solution of the multiscale problems, the local time stepping (LTS) technique has been employed in the NDGTD method. A fast time step estimation method has been presented in this communication. With high overlap between the information transmission and the data calculation, the proposed MPI + MPI scheme overcomes the degradation of the parallel efficiency of the pure MPI technique in the scenario of the LTS technique and the large-scale CPU cores. Up to 94% parallel efficiency in 6400 CPU cores is achieved for the average single-core loading about 1700 finite elements, and 18 times acceleration for time step estimation can be obtained with the fourth-order basis function. Three practical complex examples are given to demonstrate a good performance of the proposed method.

Full Text