Abstract
Invented some 65 years ago in a seminal paper by Marguerite Straus-Frank and Philip Wolfe, the Frank–Wolfe method recently enjoys a remarkable revival, fuelled by the need of fast and reliable first-order optimization methods in Data Science and other relevant application areas. This review tries to explain the success of this approach by illustrating versatility and applicability in a wide range of contexts, combined with an account on recent progress in variants, improving on both the speed and efficiency of this surprisingly simple principle of first-order optimization.
Highlights
In their seminal work (Frank and Wolfe 1956), Marguerite Straus-Frank and Philip Wolfe introduced a first-order algorithm for the minimization of convex quadratic objectives over polytopes, known as Frank–Wolfe (FW) method
In Levitin and Polyak (1966) and Demyanov and Rubinov (1970), a linear convergence rate was proved over strongly convex domains assuming a lower bound on the gradient norm, a result extended in Dunn (1979) under more general gradient inequalities
In Guélat and Marcotte (1986), linear convergence of the method was proved for strongly convex objectives with the minimum obtained in the relative interior of the feasible set
Summary
In their seminal work (Frank and Wolfe 1956), Marguerite Straus-Frank and Philip Wolfe introduced a first-order algorithm for the minimization of convex quadratic objectives over polytopes, known as Frank–Wolfe (FW) method. Wolfe’s idea was to move away from bad vertices, in case a step of the FW method moving towards good vertices did not lead to sufficient improvement on the objective This idea was successfully applied in several network equilibrium problems, where linear minimization can be achieved by solving a min-cost flow problem (see Fukushima 1984 and references therein). Like in cluster detection (see, e.g., Bomze 1997), finding the support of the solution is enough to solve the problem independently from the precision achieved Another important feature is that the linear minimization used in the method is often cheaper than the projections required by projected-gradient methods. The method can be used to approximately solve quadratic subproblems in accelerated schemes, an approach usually referred to as conditional gradient sliding (see, e.g., Carderera and Pokutta 2020; Lan and Zhou 2016)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.