Abstract
A parallel algorithm for reducing banded matrices to bidiagonal form is presented. In contrast to the rotation-based “standard approach”, our algorithm is based on Householder transforms, therefore exhibiting considerably higher data locality (BLAS level 2 instead of level 1). The update of the transformation matrices which involves the vast majority of the operations can even be blocked to allow the use of level 3 BLAS. Thus, our algorithm will outperform the standard method on a serial computer with a distinct memory hierarchy. In addition, the algorithm can be efficiently implemented in a distributed memory environment, as is demonstrated by numerical results on the Intel Paragon.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have