Abstract

In this work we investigate the practicability of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems, such as space-varying image deblurring. Such algorithms have been shown in machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the full gradient methods. Surprisingly, in some tasks such as image deblurring, many of such methods fail to converge faster than the accelerated full gradient method (FISTA), even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism to characterize whether a given inverse problem should be preferred to be solved by stochastic optimization technique with a known sampling pattern. Furthermore, to overcome another key bottleneck of stochastic optimization which is the heavy computation of proximal operators while maintaining fast convergence, we propose an accelerated primal-dual SGD algorithm and demonstrate the effectiveness of our approach in image deblurring experiments.

Highlights

  • The stochastic gradient methods [1, 2] and recently introduced variants with variance-reduction [3, 4, 5] have been widely used to solve large-scale convex optimization problem in machine learning applications

  • In some tasks such as image deblurring, many of such methods fails to converge faster than the accelerated full gradient method (FISTA), even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism to characterize whether a given inverse problem should be preferred to be solved by stochastic optimization technique with a known sampling pattern. To overcome another key bottleneck of stochastic optimization which is the heavy computation of proximal operators while maintaining fast convergence, we propose an accelerated primal-dual SGD algorithm and demonstrate the effectiveness of our approach in image deblurring experiments

  • We make the following contributions: (Evaluating the limitation of stochastic gradient algorithms.) We investigate the fundamental limit of possible acceleration of a stochastic gradient method over its full gradient counterpart by measuring the Stochastic Acceleration (SA) factor which is based on the ratio of the Lipschitz constants of the minibatched stochastic gradient and the full gradient

Read more

Summary

INTRODUCTION

The stochastic gradient methods [1, 2] and recently introduced variants with variance-reduction [3, 4, 5] have been widely used to solve large-scale convex optimization problem in machine learning applications. Such tasks can be formulated as the following: so far in the literature which reports the performance of the stochastic gradient methods in image processing applications (except for tomography reconstruction [11, 12, 13]), which involve large-scale optimization tasks in the same form of (1). While having been a proven success both in theory and in machine learning applications, there is no convincing result

FAILURES OF STOCHASTIC OPTIMIZATION
LIMITATIONS
Evaluating the Limitation of SGD-type Algorithms
PRACTICAL ACCELERATION FOR SGD
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call