Abstract
The Modified Gram-Schmidt (MGS) orthogonalization process -- used for example in the Arnoldi algorithm -- constitutes often the bottleneck that limits parallel efficiencies. Indeed, a number of communications, proportional to the square of the problem size, is required to compute the dot-products. A block formulation is attractive but it suffers from potential numerical instability.In this paper, we address this issue and propose a simple procedure that allows the use of a Block Gram-Schmidt algorithm while guaranteeing a numerical accuracy similar to MGS. The main idea is to dynamically determine the size of the blocks. The main advantage of this dynamic procedure are two-folds: first, high performance matrix-vector multiplications can be used to decrease the execution time. Next, in a parallel environment, the number of communications is reduced. Performance comparisons with the alternative Iterated CGS also show an improvement for moderate number of processors.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have