Abstract

Online convex optimization (OCO) encapsulates supervised learning when training sets are large-scale or dynamic, and has grown essential as data has proliferated. OCO decomposes learning into a sequence of sub-problems, each of which must be solved with limited information. To ensure safe model adaption or to avoid overfitting, constraints are often imposed, which are often addressed with projections or Lagrangian relaxation. To avoid this complexity incursion, we propose to study Frank-Wolfe (FW), which operates by updating in collinear directions with the gradient but guaranteed to be feasible. We specifically focus on its use in non-stationary settings, motivated by the fact that its iterates have structured sparsity that may be employed as a distribution-free change-point detector. We establish performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot. Specifically, for convex losses, we establish O(T <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sup> ) dynamic regret up to metrics of non-stationarity. We relax the algorithm's required information to only noisy gradient estimates, i.e., partial feedback. We also consider a mini-batching `Meta-Frank Wolfe', and characterize its dynamic regret. Experiments on matrix completion problem and background separation in video demonstrate favorable performance of the proposed scheme. Moreover, the structured sparsity of FW is experimentally observed to yield the sharpest tracker of change points among alternative approaches to non-stationary online convex optimization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call