Abstract

We discuss a parallel shared memory implementation of multifrontal QR factorization. To achieve high performance for general large and sparse matrices, a combination of tree and node level parallelism is used. Acceptable load balancing is obtained by the use of a pool-of-tasks approach. For the storage of frontal and update matrices, we use a buddy system based on Fibonacci blocks. It turns out to be more efficient than blocks of size 2 i , as proposed by other authors. Also the order in which memory space for update and frontal matrices are allocated is shown to be of importance. An implementation of the proposed algorithm on the CRAY X-MP/416 (four processors), gives speedups of about three with about 20% of extra real memory space required.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call