Hybrid parallelization of the total FETI solver

Lubomír Říha,Tomáš Brzobohatý,Alexandros Markopoulos

doi:10.1016/j.advengsoft.2016.04.004

Abstract

This paper describes our new hybrid parallelization of the Finite Element Tearing and Interconnecting (FETI) method for the multi-socket and multi-core computer cluster. This is an essential step in our development of the Hybrid FETI solver were small number of neighboring subdomains is aggregated into clusters and each cluster is processed by a single compute node.In our previous work we have implemented FETI solver using MPI parallelization into our ESPRESO solver. The proposed hybrid implementation provides better utilization of resources of modern HPC machines using advanced shared memory runtime systems such as Cilk++ runtime. Cilk++ is an alternative to OpenMP which is used by ESPRESO for shared memory parallelization.We have compared the performance of the hybrid parallelization to MPI-only parallelization. The results show that we have reduced both solver runtime and memory utilization. This allows a solver to use a larger number of smaller sub-domains and in order to solve larger problems using a limited number of compute nodes. This feature is essential for users with smaller computer clusters.In addition, we have evaluated this approach with large-scale benchmarks of size up to 1.3 billion of unknowns to show that the hybrid parallelization also reduces runtime of the FETI solver for these types of problems.

Full Text