Implementation of parallel sparse Cholesky factorization on GPU

Dan Zou,Yong Dou

doi:10.1109/iccsnt.2012.6526361

Abstract

Direct methods for solving large sparse symmetric positive-definite linear systems of equations are popular because of their generality and robustness. The main bottleneck is the sparse Cholesky factorization, which exhibits irregular memory access behavior and unbalanced workload. In the past 10 years, many sparse Cholesky factorization algorithms have emerged, exploiting new architectural features. However, programming techniques currently employed on these platforms are not sufficient to implement sparse Cholesky factorization on many-core graphics processing units (GPUs) due to mismatches between irregular problem structures and single-instruction multiple-thread GPU architectures. In the present paper, we propose a task-based software approach for the parallel sparse Cholesky factorization aimed at heterogeneous computing platforms with GPU accelerators. The tasks are generated by CPU. An efficient task-scheduling mechanism guarantees the correct ordering of task execution and ensures a load balanced execution on GPU. Comparisons are made with the existing solver using problems arising from a range of practical applications. The experiment results show that the proposed approach can substantially improve the performance of sparse Cholesky factorization on GPU with 2.7×-4× speedup.

Full Text