The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix $X\in\mathbb{R}^{m\times n}$, where $m\gg n$. Unfortunately it is inherently unstable and often breaks down when the matrix is ill-conditioned. A recent work [Yamamoto et al., ETNA, 44, pp. 306--326 (2015)] establishes that the instability can be cured by repeating the algorithm twice (called CholeskyQR$2$). However, the applicability of CholeskyQR$2$ is still limited by the requirement that the Cholesky factorization of the Gram matrix $X^{\top} X$ runs to completion, which means that it does not always work for matrices $X$ with the 2-norm condition number $\kappa_2(X)$ roughly greater than ${\bf u}^{-{1}/{2}}$, where ${\bf u}$ is the unit roundoff. In this work we extend the applicability to $\kappa_2(X)=\mathcal{O}({\bf u}^{-1})$ by introducing a shift to the computed Gram matrix so as to guarantee the Cholesky factorization $R^{\top}R= A^{\top}A+sI$ succeeds numerically. We show that the computed $AR^{-1}$ has reduced condition number that is roughly bounded by ${\bf u}^{-{1}/{2}}$, for which CholeskyQR$2$ safely computes the QR factorization, yielding a computed $Q$ of orthogonality $\|Q^{\top}Q-I\|_2$ and residual $\|A-QR\|_F/\|A\|_F$ both of the order of ${\bf u}$. Thus we obtain the required QR factorization by essentially running Cholesky QR thrice. We extensively analyze the resulting algorithm shiftedCholeskyQR3 to reveal its excellent numerical stability. The shiftedCholeskyQR3 algorithm is also highly parallelizable, and applicable and effective also when working with an oblique inner product. We illustrate our findings through experiments, in which we achieve significant speedup over alternative methods.
Read full abstract