Abstract

A hierarchical matrix (H-matrix) is an approximated form that represents N × N correlations of N objects. H-matrix construction is achieved by dividing a matrix into submatrices (partitioning), followed by calculating these submatrices' element values (filling). Matrix partitioning consists of two steps: cluster tree (CT) construction, where objects are divided into clusters hierarchically; and block cluster tree (BCT) construction, which involves observing all cluster pairs at the same CT level that satisfies the admissibility condition. This study proposes two parallel implementation methods of partitioning operations on distributed memory systems (DMSs): distributed cluster tree construction (DCTC) and redundant cluster tree construction (RCTC). In DCTC, both CT and BCT constructions are parallelized using workers in all computing nodes. In RCTC, CT is constructed in every computing node redundantly by employing only intra-node work stealing. The BCT is then constructed in parallel using workers in all computing nodes. RCTC cannot achieve speedup using multiple computing nodes, but can eliminate the data exchange cost incurred by DCTC. We used the task-parallel language Tascell, which employs both intra- and inter-node work stealing, to handle arbitrary unbalanced tree construction and traversal on DMSs. Our RCTC implementations achieved a 1.11-1.20-fold speedup using up to 8 nodes × 36 workers in numerical experiments with 3D electric field analyses and N ≃ 108.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call