Feasible Online Learning with Gradient Feedback Online gradient descent (OGD) is well-known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of [Formula: see text] for strongly convex cost functions, and (2) in the multiagent setting of strongly monotone games with each agent employing OGD we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of [Formula: see text]. Whereas these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In “Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback,” M. Jordan, T. Lin, and Z. Zhou design a fully adaptive OGD algorithm, AdaOGD, that does not require a priori knowledge of these parameters. In the single-agent setting, the algorithm achieves [Formula: see text] regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs AdaOGD in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of [Formula: see text], again optimal up to log factors. The algorithms are illustrated in a learning version of the classic newsvendor problem, in which, because of lost sales, only (noisy) gradient feedback can be observed. The results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multiretailer settings.
Read full abstract