Block-asynchronous and Jacobi smoothers for a multigrid solver on GPU-accelerated HPC clusters

Martin Wlotzka ,Vincent Heuveline

doi:10.11588/emclpp.2015.03.23465

Abstract

We investigate CPU- and GPU-based damped block-asynchronous iteration as an alternative for the damped CPU-based Jacobi smoother in a geometric multigrid linear solver. We depict the implementation for distributed memory systems as well as for CUDA-capable accelerators. Our numerical experiments are based on the linear problem arising from a finite element discretization of the Poisson equation. Runtime and energy measurements are presented for a dual-CPU test system equipped with a GPU. We find that the smoothing properties of the block-asynchronous smoothers are diminished by their asynchronous nature. When using a domain decomposition, damped synchronized Jacobi iteration as smoother with CPU-only computation on multiple host processes yields better performance and lower energy consumption than the block-asynchronous variants for both CPU and GPU execution. However, for a single host process without domain decomposition, the GPU-accelerated block-asynchronous method can compensate the diminished smoothing property and outperforms the CPU-only execution both in terms of runtime and energy consumption.

Full Text