The Limitation and Practical Acceleration of Stochastic Gradient Algorithms in Inverse Problems

Junqi Tang,Mike Davies,Karen Egiazarian

doi:10.1109/icassp.2019.8683368

Abstract

In this work we investigate the practicability of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems, such as space-varying image deblurring. Such algorithms have been shown in machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the full gradient methods. Surprisingly, in some tasks such as image deblurring, many of such methods fail to converge faster than the accelerated full gradient method (FISTA), even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism to characterize whether a given inverse problem should be preferred to be solved by stochastic optimization technique with a known sampling pattern. Furthermore, to overcome another key bottleneck of stochastic optimization which is the heavy computation of proximal operators while maintaining fast convergence, we propose an accelerated primal-dual SGD algorithm and demonstrate the effectiveness of our approach in image deblurring experiments.

Highlights

The stochastic gradient methods [1, 2] and recently introduced variants with variance-reduction [3, 4, 5] have been widely used to solve large-scale convex optimization problem in machine learning applications
In some tasks such as image deblurring, many of such methods fails to converge faster than the accelerated full gradient method (FISTA), even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism to characterize whether a given inverse problem should be preferred to be solved by stochastic optimization technique with a known sampling pattern. To overcome another key bottleneck of stochastic optimization which is the heavy computation of proximal operators while maintaining fast convergence, we propose an accelerated primal-dual SGD algorithm and demonstrate the effectiveness of our approach in image deblurring experiments
We make the following contributions: (Evaluating the limitation of stochastic gradient algorithms.) We investigate the fundamental limit of possible acceleration of a stochastic gradient method over its full gradient counterpart by measuring the Stochastic Acceleration (SA) factor which is based on the ratio of the Lipschitz constants of the minibatched stochastic gradient and the full gradient

Summary

INTRODUCTION

The stochastic gradient methods [1, 2] and recently introduced variants with variance-reduction [3, 4, 5] have been widely used to solve large-scale convex optimization problem in machine learning applications. Such tasks can be formulated as the following: so far in the literature which reports the performance of the stochastic gradient methods in image processing applications (except for tomography reconstruction [11, 12, 13]), which involve large-scale optimization tasks in the same form of (1). While having been a proven success both in theory and in machine learning applications, there is no convincing result

FAILURES OF STOCHASTIC OPTIMIZATION

LIMITATIONS

Evaluating the Limitation of SGD-type Algorithms

PRACTICAL ACCELERATION FOR SGD

Findings

CONCLUSION