Abstract

This chapters documents the implementation of a parallel distributed memory out-of-core (OOC) solver for performing LU and Cholesky factorizations of a large dense matrix on clusters equipped with Intel® Xeon Phi™ coprocessors. The OOC solver takes advantage of NVIDIA graphics processing units (GPU) or Intel Xeon Phi coprocessor (MIC) and allows problems larger than device memory to be solved. The OOC solver is built to be compatible with the format of the ScaLAPACK software library, making it readily portable to any existing codes using ScaLAPACK. This chapter highlights the techniques used to enable the code to run efficiently on both NVIDIA GPU and the Intel MIC. Numerical results on Beacon (an Intel Xeon plus Intel Xeon Phi cluster that composed of 48 nodes of multicore CPU and MIC) at the National Institute for Computational Sciences and the Keeneland GPU cluster are provided.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call