Abstract

The QR factorization with column pivoting (QRP), originally suggested by Golub [Numer. Math., 7 (1965), 206--216], is a popular approach to computing rank-revealing factorizations. Using Level 1 BLAS, it was implemented in LINPACK, and, using Level 2 BLAS, in LAPACK. While the Level 2 BLAS version delivers superior performance in general, it may result in worse performance for large matrix sizes due to cache effects. We introduce a modification of the QRP algorithm which allows the use of Level 3 BLAS kernels while maintaining the numerical behavior of the LINPACK and LAPACK implementations. Experimental comparisons of this approach with the LINPACK and LAPACK implementations on IBM RS/6000, SGI R8000, and DEC AXP platforms show considerable performance improvements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call