Exploiting task and data parallelism in ILUPACK’s preconditioned CG solver on NUMA architectures and many-core accelerators

José I Aliaga,Pablo Ezzatti,Rosa M Badia,Matthias Bollhöfer,Enrique S Quintana-Ortí,Maria Barreda,Ernesto Dufrechou

doi:10.1016/j.parco.2015.12.004

Exploiting task and data parallelism in ILUPACK’s preconditioned CG solver on NUMA architectures and many-core accelerators

José I Aliaga, Pablo Ezzatti + Show 5 more

Open Access

https://doi.org/10.1016/j.parco.2015.12.004

Copy DOI

Journal: Parallel computing	Publication Date: Dec 11, 2015
Citations: 12	License type: public-domain

Affiliation: Jaume I University, Universidad de la República, Technische Universität Braunschweig

#Non-Uniform Memory Access Architectures #Intel Xeon Phi + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We present specialized implementations of the preconditioned iterative linear system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms and many-core hardware co-processors based on the Intel Xeon Phi and graphics accelerators. For the conventional x86 architectures, our approach exploits task parallelism via the OmpSs runtime as well as a message-passing implementation based on MPI, respectively yielding a dynamic and static schedule of the work to the cores, with different numeric semantics to those of the sequential ILUPACK. For the graphics processor we exploit data parallelism by off-loading the computationally expensive kernels to the accelerator while keeping the numeric semantics of the sequential case.

Full Text