Abstract
This paper introduces a new parallel QR decomposition algorithm with two key advantages over previous techniques. It is specifically designed to achieve high parallel efficiency on shared memory parallel computers with a modest number of processors, and the novel load balancing method described here considers total computational work as opposed to just balancing Givens rotations. This results in expected efficiencies which approach optimal as problem size grows relative to number of processors. The hybrid nature of the algorithm seeks to maximize computation between communication and synchronization. Implementation results on shared memory multiprocessors track expected performance well up to 12 processors, and initial performance results on a distributed shared memory machine are presented.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.