Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Maximilian Bremer,Kazbek Kazhyken,Hartmut Kaiser,Clint Dawson,Craig Michoski

doi:10.1007/s10915-019-00960-z

Abstract

As high performance computing moves towards the exascale computing regime, applications are required to expose increasingly fine grain parallelism to efficiently use next generation supercomputers. Intended as a solution to the programming challenges associated with these architectures, High Performance ParalleX (HPX) is a task-based C++ runtime, which emphasizes the use of lightweight threads and algorithm-dependent synchronization to maximize parallelism exposed by the application to the machine. The aim of this work is to explore the performance benefits of an HPX parallelization versus a MPI parallelization for the discontinuous Galerkin finite element method for the two-dimensional shallow water equations. We present strong and weak scaling results comparing the performance of HPX versus a MPI parallelization strategy on Knights Landing architectures. Our results indicate that for average task sizes of $$3.6\,{\mathrm {m s}}$$ , HPX’s runtime overhead is offset by more efficient execution of the application. Furthermore, we demonstrate that running with sufficiently large task granularity, HPX is able to outperform the MPI parallelization by a factor of approximately 1.2 for up to 128 nodes.

Full Text