Worst-case complexity of cyclic coordinate descent: $$O(n^2)$$ gap with randomized version

Ruoyu Sun,Yinyu Ye

doi:10.1007/s10107-019-01437-5

Abstract

This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method, Kaczmarz method and projection onto convex sets (POCS) in this simple setting. We observe that the known provable complexity of C-CD can be $$\mathcal {O}(n^2)$$ times slower than randomized coordinate descent (R-CD), but no example was proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least $$\mathcal {O}(n^4 \kappa _{\text {CD}} \log \frac{1}{\epsilon })$$ operations, where $$\kappa _{\text {CD}}$$ is related to Demmel’s condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be $$\mathcal {O}(n^2)$$ times slower than R-CD, which has complexity $$\mathcal {O}( n^2 \kappa _{\text {CD}} \log \frac{1}{\epsilon })$$ . Note that for this example, the gap exists for any fixed update order, not just a particular order. An immediate consequence is that for Gauss–Seidel method, Kaczmarz method and POCS, there is also an $$\mathcal {O}(n^2) $$ gap between the cyclic versions and randomized versions (for solving linear systems). One difficulty with the analysis is that the spectral radius of a non-symmetric iteration matrix does not necessarily constitute a lower bound for the convergence rate. Finally, we design some numerical experiments to show that the size of the off-diagonal entries is an important indicator of the practical performance of C-CD.

Full Text