Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

Chaulio R Ferreira,Michael Bader

doi:10.1145/3093172.3093237

Abstract

We present a patch-based approach for tsunami simulation with parallel adaptive mesh refinement on the Salomon supercomputer. The special architecture of Salomon, with two Intel Xeon CPUs (Haswell architecture) and two Intel Xeon Phi coprocessors (Knights Corner) per compute node, suggests truly heterogeneous load balancing instead of offload approaches, because host and accelerator achieve comparable performance for our simulations.We use a tree-structured mesh refinement strategy resulting from newest-vertex bisection of triangular grid cells, but introduce small uniform grid patches into the leaves of the tree to allow vectorisation of the Finite Volume solver over grid cells. In particular, we implemented vectorised versions of the approximate Riemann solvers, exploiting Fortran's array notations where possible. While large patches increase computational performance due to vectorisation, improved memory access and reduced meshing overhead, they also increase the overall number of processed cells. Thus, a trade-off must be found regarding the patch size. We experimented with different patch sizes in a study of the time-to-solution of a simulation of the 2011 Tohoku tsunami, and found that relatively small patches with 82 cells resulted in the smallest execution times.We use the Xeon Phis in symmetric mode and apply heterogeneous load balancing between hosts and coprocessors, identifying the relative load distribution either from on-the-fly runtime measurements or from a priori exhaustive testing. Both approaches perform better than homogeneous load balancing and better than using only the CPUs or only the Xeon Phi coprocessors in native mode. In all set-ups, however, the absolute speedups are impeded by the slow MPI communication between Xeon Phi coprocessors.

Full Text