A Stable and Efficient Parallel Block Gram-Schmidt Algorithm

Denis Vanderstraeten

doi:10.1007/3-540-48311-x_158

Abstract

The Modified Gram-Schmidt (MGS) orthogonalization process -- used for example in the Arnoldi algorithm -- constitutes often the bottleneck that limits parallel efficiencies. Indeed, a number of communications, proportional to the square of the problem size, is required to compute the dot-products. A block formulation is attractive but it suffers from potential numerical instability.In this paper, we address this issue and propose a simple procedure that allows the use of a Block Gram-Schmidt algorithm while guaranteeing a numerical accuracy similar to MGS. The main idea is to dynamically determine the size of the blocks. The main advantage of this dynamic procedure are two-folds: first, high performance matrix-vector multiplications can be used to decrease the execution time. Next, in a parallel environment, the number of communications is reduced. Performance comparisons with the alternative Iterated CGS also show an improvement for moderate number of processors.

Full Text