Abstract
In this paper we discuss our approach to the MPI/GPU implementation of an Interior Penalty Discontinuous Galerkin Time domain (IPDGTD) method to solve the time dependent Maxwell's equations. In our approach, we exploit the inherent DGTD parallelism and describe a combined MPI/GPU and local time stepping implementation. This combination is aimed at increasing efficiency and reducing computational time, especially for multiscale applications. The CUDA programming model was used, together with non‐blocking MPI calls to overlap communications across the network. A 10× speedup compared to CPU clusters is observed for double precision arithmetic. Finally, for p = 1 basis functions, a good scalability with parallelization efficiency of 85% for up to 40 GPUs and 80% for up to 160 CPU cores was achieved on the Ohio Supercomputer Center's Glenn cluster.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have