We discuss the software implementation of a finite element method, intended for the simulation of complex-geometry 3D flows, using hardware resources effectively, including problems with a priori unknown, heterogeneous, and dynamic parallel computational load. Our fundamental choices of data structures, algorithm, and software design specifically target engineering resolution and accuracy requirements. Some of these choices are: unstructured grids (to explicitly resolve complex 3D geometries), tetrahedra-only computational elements (to enable automatic mesh generation), edge-based finite element scheme (to reduce indirect addressing for increased performance), distributed-memory parallel computing paradigm (to enable large problems), and Charm++ (https://charmplusplus.org) as the runtime system (to effectively use computing resources even in the presence of hardware heterogeneities and dynamic application requirements). We discuss aspects of the implementation that enable exercising unique features of the runtime system, e.g., the single Charm++ programming abstraction, overdecomposition, asynchronous execution, latency-hiding parallel communication and computation, task-parallelism, and dynamic load balancing via object migratability. Multiple test problems are used to verify and validate the numerical solutions and computational performance and scalability to high-performance computing environments are discussed. For maximum transparency and reproducibility, and to encourage future research, development, and use, the full source code, together with regression tests and documentation, is publicly available at https://xyst.cc.
Read full abstract