Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs

Ahmad Abdelfattah,Azzam Haidar,Stanimire Tomov,Jack Dongarra

doi:10.1016/j.procs.2016.05.303

Abstract

Solving a large number of relatively small linear systems has recently drawn more attention in the HPC community, due to the importance of such computational workloads in many scientific applications, including sparse multifrontal solvers. Modern hardware accelerators and their architecture require a set of optimization techniques that are very different from the ones used in solving one relatively large matrix. In order to impose concurrency on such throughput-oriented architectures, a common practice is to batch the solution of these matrices as one task offloaded to the underlying hardware, rather than solving them individually.This paper presents a high performance batched Cholesky factorization on large sets of relatively small matrices using Graphics Processing Units (GPUs), and addresses both fixed and variable size batched problems. We investigate various algorithm designs and optimization techniques, and show that it is essential to combine kernel design with performance tuning in order to achieve the best possible performance. We compare our approaches against state-of-the-art CPU solutions as well as GPU-based solutions using existing libraries, and show that, on a K40c GPU for example, our kernels are more than 2× faster.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Computer Science	Publication Date: Jan 1, 2016
Citations: 13	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

Protein alignment algorithms with an efficient backtracking routine on multiple GPUs
Jacek Blazewicz ... Pawel Wojciechowski
BMC Bioinformatics | VOL. 12
Jacek Blazewicz, et. al.Jacek Blazewicz ... Pawel Wojciechowski
20 May 2011
BMC Bioinformatics | VOL. 12

Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms
...
-
, et. al. ...
10 Nov 2008
10 Nov 2008

A Survey of Performance Tuning Techniques and Tools for Parallel Applications
Dheya Mustafa
IEEE Access | VOL. 10
Dheya MustafaDheya Mustafa
01 Jan 2021
IEEE Access | VOL. 10

Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms
Zhuo Feng ... Peng Li
-
Zhuo Feng, et. al.Zhuo Feng ... Peng Li
01 Nov 2008
01 Nov 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science