Abstract

Sparse supernodal Cholesky on multi-NUMAs is challenging due to the supernode relaxation and load balancing. In this work, we propose a novel approach to improve the performance of sparse Cholesky by combining deep learning with a relaxation parameter and a hierarchical parallelization strategy with NUMA affinity. Specifically, our relaxed supernodal algorithm utilizes a well-trained GCN model to adaptively adjust relaxation parameters based on the sparse matrix’s structure, achieving a proper balance between task-level parallelism and dense computational granularity. Additionally, the hierarchical parallelization maps supernodal tasks to the local NUMA parallel queue and updates contribution blocks in pipeline mode. Furthermore, the stream scheduling with NUMA affinity can further enhance the efficiency of memory access during the numerical factorization. The experimental results show that HPS Cholesky can outperform state-of-the-art libraries, such as Eigen LL T , CHOLMOD, PaStiX and SuiteSparse on 79.78%, 79.60%, 82.09% and 74.47% of 1,128 datasets. It achieves an average speedup of 1.41x over the current optimal relaxation algorithm. Moreover, 70.83% of matrices have surpassed MKL sparse Cholesky on Xeon Gold 6248.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call