Abstract

With the well-documented popularity of Frank Wolfe (FW) algorithms in machine learning tasks, the present paper establishes links between FW subproblems and the notion of momentum emerging in accelerated gradient methods (AGMs). On the one hand, these links reveal why momentum is unlikely to be effective for FW-type algorithms on general problems. On the other hand, it is established that momentum accelerates FW on a class of signal processing and machine learning applications. Specifically, it is proved that a momentum variant of FW, here termed accelerated Frank Wolfe (AFW), converges with a faster rate <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O</i> (\frac1k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) on such a family of problems, despite the same <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O</i> (\frac1k) rate of FW on general cases. Distinct from existing fast convergent FW variants, the faster rates here rely on parameter-free step sizes. Numerical experiments on benchmarked machine learning tasks corroborate the theoretical findings.

Highlights

  • We consider efficient means of solving the following optimization problem min f (x) (1)x∈X where f is a smooth convex function

  • We discuss in Appendix C that accelerated gradient methods (AGMs) for strongly convex problems updates its momentum using exactly the same idea of Frank Wolfe (FW), that is, both obtain a minimizer of a lower bound of f (x), and perform an update through a convex combination

  • We built links between the momentum in AGM and the FW step by observing that they are both minimizing an lower bound of the objective function

Read more

Summary

A Momentum-Guided Frank-Wolfe Algorithm

Wolfe (FW) algorithms in machine learning tasks, the present paper establishes links between FW subproblems and the notion of momentum emerging in accelerated gradient methods (AGMs). These links reveal why momentum is unlikely to be effective for FW-type algorithms on general problems. It is established that momentum accelerates FW on a class of signal processing and machine learning applications. It is proved that a momentum variant of FW, here termed accelerated Frank Wolfe (AFW), converges with a faster rate. ) on such of FW on a family of problems, general cases. Distinct despite the same from existing fast convergent FW variants, the faster rates here rely on parameterfree step sizes. Numerical experiments on benchmarked machine learning tasks corroborate the theoretical findings

INTRODUCTION
Related works
Our contributions
PRELIMINARY
7: Return: xK
CONNECTING MOMENTUM WITH FW
8: Return: xK
MOMENTUM-GUIDED FW
AFW convergence for general problems
AFW acceleration for a class of problems
NUMERICAL TESTS
Binary classification
Matrix completion
CONCLUSIONS
Proof of Theorem 1
AGM Links with FW in strongly convex case
Proof of Theorem 2
Proof of Theorem 3

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.