Abstract

This paper introduces a new parallel QR decomposition algorithm with two key advantages over previous techniques. It is specifically designed to achieve high parallel efficiency on shared memory parallel computers with a modest number of processors, and the novel load balancing method described here considers total computational work as opposed to just balancing Givens rotations. This results in expected efficiencies which approach optimal as problem size grows relative to number of processors. The hybrid nature of the algorithm seeks to maximize computation between communication and synchronization. Implementation results on shared memory multiprocessors track expected performance well up to 12 processors, and initial performance results on a distributed shared memory machine are presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call