Accelerating advanced preconditioning methods on hybrid architectures

Ernesto Dufrechou

doi:10.19153/cleiej.24.1.6

Abstract

Many problems, in diverse areas of science and engineering, involve the solution of largescale sparse systems of linear equations. In most of these scenarios, they are also a computational bottleneck, and therefore their efficient solution on parallel architectureshas motivated a tremendous volume of research.This dissertation targets the use of GPUs to enhance the performance of the solution of sparse linear systems using iterative methods complemented with state-of-the-art preconditioned techniques. In particular, we study ILUPACK, a package for the solution of sparse linear systems via Krylov subspace methods that relies on a modern inverse-based multilevel ILU (incomplete LU) preconditioning technique.We present new data-parallel versions of the preconditioner and the most important solvers contained in the package that significantly improve its performance without affecting its accuracy. Additionally we enhance existing task-parallel versions of ILUPACK for shared- and distributed-memory systems with the inclusion of GPU acceleration. The results obtained show a sensible reduction in the runtime of the methods, as well as the possibility of addressing large-scale problems efficiently.

Highlights

Sparse systems of linear equations appear in many areas of knowledge, such as circuit simulation, optimal control, quantum mechanics or economics [1, 2]
A number of current real-world applications involve linear systems with millions of equations and unknowns. Direct solvers such as those based on Gaussian Elimination (GE) [3], which apply a sequence of matrix transformations to reach an equivalent but easier-to-solve system, today fall short when solving large-scale problems because of their excessive memory requirements, impractical time to solution and complexity of implementation
The favorable numerical properties of ILUPACK’s preconditioner in the context of an iterative solvers come at the cost of expensive construction and application procedures, especially for large-scale sparse linear systems. This high computational cost motivated the development of task-parallel implementations of ILUPACK for shared-memory and message-passing platforms [5, 6, 7] but, despite showing good performance and scalability results, these variants of ILUPACK are limited to the solution of symmetric and positive-definite (SPD) linear systems, and they slightly modify the preconditioner to exploit taskparallelism

Summary

Introduction

Sparse systems of linear equations appear in many areas of knowledge, such as circuit simulation, optimal control, quantum mechanics or economics [1, 2]. A number of current real-world applications (as for example three-dimensional PDEs) involve linear systems with millions of equations and unknowns Direct solvers such as those based on Gaussian Elimination (GE) [3], which apply a sequence of matrix transformations to reach an equivalent but easier-to-solve system, today fall short when solving large-scale problems because of their excessive memory requirements, impractical time to solution and complexity of implementation. The favorable numerical properties of ILUPACK’s preconditioner in the context of an iterative solvers come at the cost of expensive construction and application procedures, especially for large-scale sparse linear systems This high computational cost motivated the development of task-parallel implementations of ILUPACK for shared-memory and message-passing platforms [5, 6, 7] but, despite showing good performance and scalability results, these variants of ILUPACK are limited to the solution of symmetric and positive-definite (SPD) linear systems, and they slightly modify the preconditioner to exploit taskparallelism. The experimental evaluation is performed on the Jetson AGX Xavier board, one of the latest low-power computing platform from NVIDIA (Section 8)

Objectives

Sparse systems of linear equations

ILUPACK

Task-parallel ILUPACK

Related work

Data-parallel variants

Leveraging task and data parallelism in BiCG

Variant for two GPUs

Leveraging task-parallelism in the GPU using streams

Concurrent CPU-GPU variant

Self-scheduled sparse triangular solvers

Variant of BiCGStab for low-power devices

Concluding Remarks

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: CLEI Electronic Journal	Publication Date: Apr 14, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accelerating advanced preconditioning methods on hybrid architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: CLEI Electronic Journal

Lead the way for us

Similar Papers

Overcoming Memory-Capacity Constraints in the Use of ILUPACK on Graphics Processors
Jose I Aliaga ... Ernesto Dufrechou
-
Jose I Aliaga, et. al.Jose I Aliaga ... Ernesto Dufrechou
01 Oct 2017
01 Oct 2017

Parallel computing of sparse linear systems using matrix condensation algorithm
Robert Armistead ... Fangxing Li
-
Robert Armistead, et. al.Robert Armistead ... Fangxing Li
01 Jun 2011
01 Jun 2011

Parallel computing of sparse linear systems using matrix condensation algorithm
R Armistead ... Fangxing Li
-
R Armistead, et. al.R Armistead ... Fangxing Li
01 Jun 2011
01 Jun 2011

A systolic accelerator for the iterative solution of sparse linear systems
R Melham
IEEE Transactions on Computers | VOL. 38
R MelhamR Melham
01 Jan 1989
IEEE Transactions on Computers | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating advanced preconditioning methods on hybrid architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: CLEI Electronic Journal