Abstract
A hierarchical matrix is an approximated form of the dense matrix that represents the N-by-N correlations of N objects. The construction of the hierarchical matrix is achieved by dividing a matrix into submatrices (partitioning), followed by calculating these submatrices' entries (filling). The matrix partitioning is divided into two sub-steps: cluster tree (CT) construction by dividing objects into clusters, hierarchically, and block cluster tree (BCT) construction by observing all cluster pairs at the same level of the cluster tree that satisfy a given admissibility condition. We can apply task parallelism to filling by treating each submatrix as a parallelization unit. In existing implementations, the partitioning is redundantly executed on every computing node, and filling is executed in parallel by assigning a set of tasks to each worker statically. However, we cannot get good load balance using such implementations because it is difficult to predict the workload of each task precisely. In this paper, we propose a new parallel implementation of the hierarchical matrix construction, where a BCT construction is executed in parallel using all workers of all computing nodes. When a worker finds a BCT leaf during BCT construction, it executes filling for the submatrix corresponding to the leaf. To get good load balance, we parallelized the BCT construction using the task parallel language Tascell, which enables easy and efficient parallelization of tree recursive algorithms by employing a dynamic load balancing strategy. In numerical experiments with 3D electric field analyses using up to 16 computing nodes each of which has 36 cores, our implementation achieved up to 1.9-fold speedups compared to an implementation using the static task assignment strategy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have