Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation

Adam Dziekonski,Michal Mrozowski,Adam Lamecki,Piotr Sypek

doi:10.1109/tmtt.2017.2714670

Abstract

This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral meshes that lead to sparse matrices too large to be dealt with on a typical workstation using direct methods. We discuss the problem of rapid matrix generation and assembly, as well as accelerating preconditioned iterative solvers in the context of limited on-board GPU memory, and we show how to mitigate some of these problems using multiple GPUs. We propose a new fast data-distribution technique for multi-GPU platforms that allows optimal splitting of finite-element method (FEM) matrices between graphics accelerators. The technique draws upon the graph partitioning approach used in nonoverlapping domain-decomposition methods and provides information that drives the FEM matrix-generation and assembly process in such a way that it produces data structures for each GPU; this not only ensures load balancing and minimizes communication between GPUs, but also reflects the hierarchy of the basis functions. The concepts proposed in this paper are illustrated with examples involving sparse matrices of up to 13.9 million rows and over a billion nonzero elements.

Full Text