Abstract
We describe the design of a sparse direct solver for symmetric positive-definite systems using the PaRSEC runtime system. In this approach the application is represented as a DAG of tasks and the runtime system runs the DAG on the target architecture. Portability of the code across different architectures is enabled by delegating to the runtime system the task scheduling and data management. Although runtime systems have been exploited widely in the context of dense linear algebra, the DAGs arising in sparse linear algebra algorithms remain a challenge for such tools because of their irregularity. In addition to overheads induced by the runtime system, the programming model used to describe the DAG impacts the performance and the scalability of the code. In this study we investigate the use of a Parametrized Task Graph (PTG) model for implementing a task-based supernodal method. We discuss the benefits and limitations of this model compared to the popular Sequential Task Flow model (STF) and conduct numerical experiments on a multicore system to assess our approach. We also validate the performance of our solver SpLLT by comparing it to the state-of-the-art solver MA87 from the HSL library.
Highlights
We investigate the use of a runtime system for implementing a sparse Cholesky decomposition for solving the linear system Ax = b, (1.1)where A is a large sparse symmetric positive-definite matrix
The code is compiled with the GNU compiler, the BLAS and LAPACK routines are provided by the Intel MKL v11.3 library and we used the latest version of the PaRSEC runtime system
In this study we presented the design of a task-based sparse Cholesky solver using a Parametrized Task Graph (PTG) model and implemented with the PaRSEC runtime system
Summary
Where A is a large sparse symmetric positive-definite matrix. In this approach the runtime system acts as a software layer between our application and the target architecture and enables portability of our code across different architectures. Many dense linear algebra software packages have already exploited this approach and have shown that it is efficient for exploiting modern architectures ranging from multicores to large-scale machines including heterogeneous systems. Two examples of such libraries are DPLASMA [4] built with the PaRSEC [5] runtime system and Chameleon which has an interface to several runtime systems including StarPU [3] and PaRSEC. The PLASMA package [10], which used to rely on the QUARK runtime system, has been ported to OpenMP using the tasking features offered in the latest versions of the standard This transition improved the portability and maintainability of this library and didn’t impact the performance of the code [15]. We use the PaRSEC runtime system to implement the PTG and compare it with our existing OpenMP implementation and the HSL MA87 solver
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.