Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays. Nevertheless, parallel efficiency degradation is still a problem in large-scale structural analysis based on heterogeneous multicore clusters. To solve it, a hybrid hierarchical parallel algorithm (HHPA) is proposed on the basis of the conventional domain decomposition algorithm (CDDA) and the parallel sparse solver. In this new algorithm, a three-layer parallelization of the computational procedure is introduced to enable the separation of the communication of inter-nodes, heterogeneous-core-groups (HCGs) and inside-heterogeneous-core-groups through mapping computing tasks to various hardware layers. This approach can not only achieve load balancing at different layers efficiently but can also improve the communication rate significantly through hierarchical communication. Additionally, the proposed hybrid parallel approach in this article can reduce the interface equation size and further reduce the solution time, which can make up for the shortcoming of growing communication overheads with the increase of interface equation size when employing CDDA. Moreover, the distributed sparse storage of a large amount of data is introduced to improve memory access. By solving benchmark instances on the Shenwei-Taihuzhiguang supercomputer, the results show that the proposed method can obtain higher speedup and parallel efficiency compared with CDDA and more superior extensibility of parallel partition compared with the two-level parallel computing algorithm (TPCA).
Read full abstract