Abstract
With the well-documented popularity of Frank Wolfe (FW) algorithms in machine learning tasks, the present paper establishes links between FW subproblems and the notion of momentum emerging in accelerated gradient methods (AGMs). On the one hand, these links reveal why momentum is unlikely to be effective for FW-type algorithms on general problems. On the other hand, it is established that momentum accelerates FW on a class of signal processing and machine learning applications. Specifically, it is proved that a momentum variant of FW, here termed accelerated Frank Wolfe (AFW), converges with a faster rate <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O</i> (\frac1k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) on such a family of problems, despite the same <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O</i> (\frac1k) rate of FW on general cases. Distinct from existing fast convergent FW variants, the faster rates here rely on parameter-free step sizes. Numerical experiments on benchmarked machine learning tasks corroborate the theoretical findings.
Highlights
We consider efficient means of solving the following optimization problem min f (x) (1)x∈X where f is a smooth convex function
We discuss in Appendix C that accelerated gradient methods (AGMs) for strongly convex problems updates its momentum using exactly the same idea of Frank Wolfe (FW), that is, both obtain a minimizer of a lower bound of f (x), and perform an update through a convex combination
We built links between the momentum in AGM and the FW step by observing that they are both minimizing an lower bound of the objective function
Summary
Wolfe (FW) algorithms in machine learning tasks, the present paper establishes links between FW subproblems and the notion of momentum emerging in accelerated gradient methods (AGMs). These links reveal why momentum is unlikely to be effective for FW-type algorithms on general problems. It is established that momentum accelerates FW on a class of signal processing and machine learning applications. It is proved that a momentum variant of FW, here termed accelerated Frank Wolfe (AFW), converges with a faster rate. ) on such of FW on a family of problems, general cases. Distinct despite the same from existing fast convergent FW variants, the faster rates here rely on parameterfree step sizes. Numerical experiments on benchmarked machine learning tasks corroborate the theoretical findings
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.