Node-Level optimization of a 3D Block-Based Multiresolution Compressible Flow Solver with Emphasis on Performance Portability

Nils Hoppe,Stefan Adami,Momme Allalen,Igor Pasichnyk,Nikolaus A Adams

doi:10.1109/hpcs48598.2019.9188088

Abstract

Despite the enormous increase in computational power in the last decades, the numerical study of complex flows remains challenging. State-of-the-art techniques to simulate hyperbolic flows with discontinuities rely on computationally demanding nonlinear schemes, such as Riemann solvers with weighted essentially non-oscillatory (WENO) stencils and characteristic decompositioning. To handle this complexity the numerical load can be reduced via a multiresolution (MR) algorithm with local time stepping (LTS) running on modern high-performance computing (HPC) systems. Eventually, the main challenge lies in an efficitent utilization of the available HPC hardware. In this work, we evaluate the performance improvement for a Message Passing Interface (MPI)-parallelized MR solver using single instruction multiple data (SIMD) optimizations. We present straight-forward code modifications that allow for auto-vectorization by the compiler, while maintaining the modularity of the code at comparable performance. We demonstrate performance improvements for representative Euler flow examples on both Intel Haswell and Intel Knights Landing Xeon Phi microarchitecture (KNL) clusters. The tests show single-core speedups of 1.7 (1.9) and average speedups of 1.4 (1.6) for the Haswell (KNL).

Full Text