Supernodal sparse Cholesky factorization on graphics processing units

Dan Zou,Rongchun Li,Song Guo,Lin Deng,Yong Dou

doi:10.1002/cpe.3158

Abstract

SUMMARYSparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization.In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction.Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Supernodal sparse Cholesky factorization on graphics processing units

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Journal: Concurrency and Computation: Practice and Experience	Publication Date: Oct 11, 2013
Citations: 4

Similar Papers

Implementation of parallel sparse Cholesky factorization on GPU
Dan Zou ... Yong Dou
-
Dan Zou, et. al.Dan Zou ... Yong Dou
01 Dec 2012
01 Dec 2012

Adaptive signal processing for multichannel sound using high performance computing
Jorge Lorente Giner
-
Jorge Lorente GinerJorge Lorente Giner
02 Dec 2015
02 Dec 2015

Speeding up audio fingerprinting over GPUs
Chung-Che Wang ... Jyh-Shing Roger Jang
-
Chung-Che Wang, et. al.Chung-Che Wang ... Jyh-Shing Roger Jang
01 Jul 2014
01 Jul 2014

GPGPU Task Scheduling Technique for Reducing the Performance Deviation of Multiple GPGPU Tasks in RPC-Based GPU Virtualization Environments
Jihun Kang ... Heonchang Yu
Symmetry | VOL. 13
Jihun Kang, et. al.Jihun Kang ... Heonchang Yu
20 Mar 2021
Symmetry | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Supernodal sparse Cholesky factorization on graphics processing units

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience