Abstract

SummaryThis study shows how GPU performance of the ADER discontinuous Galerkin method in SeisSol (an earthquake simulation software) can be further improved while preserving its original design that ensures high CPU performance. We introduce a new code generator (“ChainForge”) that fuses subsequent batched matrix multiplications (“GEMMs”) into a single GPU kernel, holding intermediate results in shared memory as long as necessary. The generator operates as an external module linked against SeisSol's domain specific language YATeTo and, as a result, the original SeisSol source code remains mainly unchanged. In this paper, we discuss several challenges related to automatic fusion of GPU kernels and provide solutions to them. By and large, we gain 60% in performance of SeisSol's wave propagation solver using Fused‐GEMMs compared to the original GPU implementation. We demonstrated this on benchmarks as well as on a real production scenario simulating the Northridge 1994 earthquake.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call