OpenMP parallel implementation of stiffly stable time-stepping projection/GMRES(ILU(0)) implicit simulation of incompressible fluid flows on shared-memory, multicore architecture

Xiao Xu

doi:10.1016/j.amc.2019.02.053

Abstract

This work showcases that simply by using a few standard OpenMP compiler directives and without any major source code modifications, simulations of both steady and unsteady incompressible fluid flows by the implicit stiffly stable time-stepping projection method could easily be ported to run in parallel on any shared-memory, multicore systems with reasonably good computation performance. The shared-memory, multicore architecture is the state-of-the-art hardware environment for modern microprocessors, including computation nodes of supercomputers, workstation, and personal desktop and laptop. This work has modified the original semi-implicit, stiffly stable time-stepping projection method to be fully implicit. Following the modified method, the approximation of incompressible Navier Stokes (INS) equations is broken into a sequence of steps. Each step consists of three stages, dealing with advection, pressure and viscosity of the INS equations respectively. Each stage of every individual step is governed by its own governing equation(s), physical discretization of which transforms the partial-differential equation(s) to a set of linear algebraic equations, which in general matrix form are Ax=b. The ILU(0) preconditioner and the generalized minimum residual (GMRES) algorithm are used for the preconditioning and solution of Ax=b, respectively. Using Kovasznay and Pearson vortex flows as reference problems for steady and unsteady incompressible flows, respectively, the profiling tests have demonstrated that for both flows, more than 99% of the whole simulation time is spent on the production of ILU(0) preconditioner and GMRES implementation, which are then singled out to be OpenMP parallelized. Benchmarked on a computation node of 16 CPU cores, it has been demonstrated that mean parallel efficiencies of 72% and 73% have been achieved for simulations of Kovasznay and Pearson vortex flows, respectively. When all 16 CPU cores are used, simulations of Kovaszanay and Pearson vortex flows have been accelerated by a factor of 8.0 and 8.17, respectively. Performance and efficiency of OpenMP parallel implementation depend on the hardware system as well. Running the Kovasznay flow simulation on a 12 CPU-core workstation, the mean parallel efficiency is 89%, in contrast to the mean parallel efficiency of 72% that has been achieved on the 16 CPU-core computation node.

Full Text