LU Factorization with Partial Pivoting for a Multicore System with Accelerators

J Kurzak,J Dongarra,M Faverge,P Luszczek

doi:10.1109/tpds.2012.242

Abstract

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the high performance LINPACK benchmark. This paper presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion between the computational power of the CPUs, compared to the GPUs, and in the meager bandwidth of the communication link between their memory systems. An additional challenge comes from the complexity of the memory-bound and synchronization-rich nature of the panel factorization component of the block LU algorithm, imposed by the use of partial pivoting. The challenges are tackled with the use of a data layout geared toward complex memory hierarchies, autotuning of GPU kernels, fine-grain parallelization of memory-bound CPU operations and dynamic scheduling of tasks to different devices. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Apr 9, 2013
Citations: 67

Similar Papers

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System
Jakub Kurzak ... Pitior Luszczek
-
Jakub Kurzak, et. al.Jakub Kurzak ... Pitior Luszczek
01 Mar 2012
01 Mar 2012

Programming the LU Factorization for a Multicore System with Accelerators
Jakub Kurzak ...
-
Jakub Kurzak, et. al.Jakub Kurzak ...
01 Jan 2013
01 Jan 2013

Chapter 11 - Gaussian Elimination and the LU Decomposition
William Ford
Numerical Linear Algebra with Applications | VOL. -
William FordWilliam Ford
19 Sep 2014
Numerical Linear Algebra with Applications | VOL. -

Rounding errors in algebraic processes: by J. H. Wilkinson. 161 pages, diagrams, 6 × 9 in. Englewood Cliffs, Prentice-Hall Inc., 1964. Price, $6.00

Journal of the Franklin Institute | VOL. 277

01 Jun 1964
Rounding errors in algebraic processes: by J. H. Wilkinson. 161 pages, diagrams, 6 × 9 in. Englewood Cliffs, Prentice-Hall Inc., 1964. Price, $6.00

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems