Abstract

Invented some 65 years ago in a seminal paper by Marguerite Straus-Frank and Philip Wolfe, the Frank–Wolfe method recently enjoys a remarkable revival, fuelled by the need of fast and reliable first-order optimization methods in Data Science and other relevant application areas. This review tries to explain the success of this approach by illustrating versatility and applicability in a wide range of contexts, combined with an account on recent progress in variants, improving on both the speed and efficiency of this surprisingly simple principle of first-order optimization.

Highlights

  • In their seminal work (Frank and Wolfe 1956), Marguerite Straus-Frank and Philip Wolfe introduced a first-order algorithm for the minimization of convex quadratic objectives over polytopes, known as Frank–Wolfe (FW) method

  • In Levitin and Polyak (1966) and Demyanov and Rubinov (1970), a linear convergence rate was proved over strongly convex domains assuming a lower bound on the gradient norm, a result extended in Dunn (1979) under more general gradient inequalities

  • In Guélat and Marcotte (1986), linear convergence of the method was proved for strongly convex objectives with the minimum obtained in the relative interior of the feasible set

Read more

Summary

Introduction

In their seminal work (Frank and Wolfe 1956), Marguerite Straus-Frank and Philip Wolfe introduced a first-order algorithm for the minimization of convex quadratic objectives over polytopes, known as Frank–Wolfe (FW) method. Wolfe’s idea was to move away from bad vertices, in case a step of the FW method moving towards good vertices did not lead to sufficient improvement on the objective This idea was successfully applied in several network equilibrium problems, where linear minimization can be achieved by solving a min-cost flow problem (see Fukushima 1984 and references therein). Like in cluster detection (see, e.g., Bomze 1997), finding the support of the solution is enough to solve the problem independently from the precision achieved Another important feature is that the linear minimization used in the method is often cheaper than the projections required by projected-gradient methods. The method can be used to approximately solve quadratic subproblems in accelerated schemes, an approach usually referred to as conditional gradient sliding (see, e.g., Carderera and Pokutta 2020; Lan and Zhou 2016)

Organisation of the paper
Notation
The classical Frank–Wolfe method
Examples
Traffic assignment
Submodular optimization
LASSO problem
Matrix completion
Adversarial attacks in machine learning
Minimum enclosing ball
Training linear Support Vector Machines
Finding maximal cliques in graphs
Finding sparse points in a set
Stepsizes
The FW gap
Variants
Sparse approximation properties
Affine invariance
Support identification for the AFW
Inexact linear oracle
Linear convergence under an angle condition
Objective
Strongly convex domains
Block coordinate Frank–Wolfe method
Variants for the min-norm point problem
Variants for optimization over the trace norm ball
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call