Abstract

Low-rank matrices arise in many scientific and engineering computations. Both computational and storage costs of manipulating such matrices may be reduced by taking advantages of their low-rank properties. To compute a low-rank approximation of a dense matrix, in this paper, we study the performance of QR factorization with column pivoting or with restricted pivoting on multicore CPUs with a GPU. We first propose several techniques to reduce the postprocessing time, which is required for restricted pivoting, on a modern CPU. We then examine the potential of using a GPU to accelerate the factorization process with both column and restricted pivoting. Our performance results on two eight-core Intel Sandy Bridge CPUs with one NVIDIA Kepler GPU demonstrate that using the GPU, the factorization time can be reduced by a factor of more than two. In addition, to study the performance of our implementations in practice, we integrate them into a recently developed software StruMF which algebraically exploits such low-rank structures for solving a general sparse linear system of equations. Our performance results for solving Poisson's equations demonstrate that the proposed techniques can significantly reduce the preconditioner construction time of StruMF on the CPUs, and the construction time can be further reduced by 10%–50% using the GPU.

Highlights

  • In applied and numerical mathematics or in scientific and engineering simulations, we often encounter low-rank matrices, and more frequently, we encounter matrices whose submatrices are low-rank

  • We studied the performance of QP3 and QPR for computing the low-rank approximation of dense submatrices

  • We investigated the potential of using a GPU to accelerate the factorization time

Read more

Summary

Introduction

In applied and numerical mathematics or in scientific and engineering simulations, we often encounter low-rank matrices, and more frequently, we encounter matrices whose submatrices are low-rank. We study the performance of the following two algorithms for computing a low-rank approximation of a dense matrix on multicore CPUs and investigate the potential of using a GPU to accelerate the process: (1) the QR factorization with column pivoting (QP3) [1, 2] and (2) the QR factorization with restricted pivoting (QPR) [3, 4]. StruMF algebraically exploits a low-rank structure, referred to as a hierarchically semiseparable (HSS) structure [7], of a coefficient matrix while computing and applying a preconditioner, and uses QP3 and QPR for computing the low-rank approximation of the dense submatrices. Throughout this paper, the jth column of a matrix A is denoted by aj, while Ai1:i2,j1:j2 is the submatrix consisting of the i1th through the i2th rows and the j1th through the j2th columns of A

QR Algorithms with Column Pivoting
Case Studies with StruMF on a CPU
GPU Implementations
Performance Studies with a GPU
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call