Parallel Multigrid Method on Multicore/Manycore Clusters

Kengo Nakajima

doi:10.1145/3373271.3373273

Abstract

Parallel multigrid method is expected to be a useful algorithm in exascale era because of its scalability. It is widely known that overhead of coarse grid solver in parallel multigrid method is significant, if the number of MPI processes is O(104) or larger. The author proposed the hCGA for avoiding such overhead. Recently, the AM-hCGA, further optimized version of the hCGA, was proposed by the author, and its performance was evaluated on the Oakforest-PACS system (OFP) with IHK/McKernel at JCAHPC using up to 2,048 nodes of Intel Xeon Phi (Knights Landing). In the present work, developed method is also implemented to the Oakbridge-CX system (OBCX) at the University of Tokyo using up to 1,024 nodes (2,048 sockets) of Intel Xeon Platinum 8280 (Cascade Lake). Performance in weak and strong scaling are evaluated for application on 3D groundwater flow through heterogeneous porous media (pGW3D-FVM). The hCGA and the AM-hCGA provide excellent performance on both of OFP and OBCX with larger number of nodes. Especially, it achieved excellent performance in strong scaling on OBCX.

Full Text