Stochastic Mirror Research Articles

We consider distributionally robust optimization (DRO) problems, reformulated as distributionally robust feasibility (DRF) problems, with multiple expectation constraints. We propose a generic stochastic first-order meta-algorithm, where the decision variables and uncertain distribution parameters are each updated separately by applying stochastic first-order methods. We then specialize our results to the case of using two specific versions of stochastic mirror descent (SMD): (i) a novel approximate version of SMD to update the decision variables, and (ii) the bandit mirror descent method to update the distribution parameters in the case of [Formula: see text]-divergence sets. For this specialization, we demonstrate that the total number of iterations is independent of the dimensions of the decision variables and distribution parameters. Moreover, the cost per iteration to update both sets of variables is nearly independent of the dimension of the distribution parameters, allowing for high-dimensional ambiguity sets. Furthermore, we show that the total number of iterations of our algorithm has a logarithmic dependence on the number of constraints. Experiments on logistic regression with fairness constraints, personalized parameter selection in a social network, and the multi-item newsvendor problem verify the theoretical results and show the usefulness of the algorithm, in particular when the dimension of the distribution parameters is large. History: Accepted by Antonio Frangioni, Area Editor for Design & Analysis of Algorithms—Continuous. Funding: This work was supported by the National Science Foundation [Grant 2112533]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoc.2023.0167 .

Read full abstract

Most modern learning problems are highly overparameterized, i.e., have many more model parameters than the number of training data points. As a result, the training loss may have infinitely many global minima (parameter vectors that perfectly "interpolate" the training data). It is therefore imperative to understand which interpolating solutions we converge to, how they depend on the initialization and learning algorithm, and whether they yield different test errors. In this article, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which stochastic gradient descent (SGD) is a special case. Recently, it has been shown that for overparameterized linear models, SMD converges to the closest global minimum to the initialization point, where closeness is in terms of the Bregman divergence corresponding to the potential function of the mirror descent. With appropriate initialization, this yields convergence to the minimum-potential interpolating solution, a phenomenon referred to as implicit regularization. On the theory side, we show that for sufficiently-overparameterized nonlinear models, SMD with a (small enough) fixed step size converges to a global minimum that is "very close" (in Bregman divergence) to the minimum-potential interpolating solution, thus attaining approximate implicit regularization. On the empirical side, our experiments on the MNIST and CIFAR-10 datasets consistently confirm that the above phenomenon occurs in practical scenarios. They further indicate a clear difference in the generalization performances of different SMD algorithms: experiments on the CIFAR-10 dataset with different regularizers, l1 to encourage sparsity, l2 (SGD) to encourage small Euclidean norm, and l∞ to discourage large components, surprisingly show that the l∞ norm consistently yields better generalization performance than SGD, which in turn generalizes better than the l1 norm.

Read full abstract

Stochastic Mirror Research Articles

Related Topics

Articles published on Stochastic Mirror

Stochastic Mirror Descent for Convex Optimization with Consensus Constraints

Stochastic First-Order Algorithms for Constrained Distributionally Robust Optimization

Gossip-based distributed stochastic mirror descent for constrained optimization

Event-Triggered Distributed Stochastic Mirror Descent for Convex Optimization.

Stochastic mirror descent method for linear ill-posed problems in Banach spaces

Faster randomized block sparse Kaczmarz by averaging

Stochastic Mirror Descent on Overparameterized Nonlinear Models.

Sparse recovery by reduced variance stochastic approximation

Nonasymptotic Concentration Rates in Cooperative Learning—Part II: Inference on Compact Hypothesis Sets

Nonasymptotic Concentration Rates in Cooperative Learning–Part I: Variational Non-Bayesian Social Learning

Variance reduction on general adaptive stochastic mirror descent

Policy Optimization with Stochastic Mirror Descent

Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses

Sparse Representations of Positive Functions via First- and Second-Order Pseudo-Mirror Descent

Fastest rates for stochastic mirror descent methods

Low-Depth Gradient Measurements Can Improve Convergence in Variational Hybrid Quantum-Classical Algorithms.

On stochastic mirror descent with interacting particles: Convergence properties and variance reduction

Distributed stochastic mirror descent algorithm for resource allocation problem

Inexact stochastic mirror descent for two-stage nonlinear stochastic programs

“Relative Continuity” for Non-Lipschitz Nonsmooth Convex Optimization Using Stochastic (or Deterministic) Mirror Descent

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Stochastic Mirror Research Articles

Related Topics

Articles published on Stochastic Mirror

Stochastic Mirror Descent for Convex Optimization with Consensus Constraints

Stochastic First-Order Algorithms for Constrained Distributionally Robust Optimization

Gossip-based distributed stochastic mirror descent for constrained optimization

Event-Triggered Distributed Stochastic Mirror Descent for Convex Optimization.

Stochastic mirror descent method for linear ill-posed problems in Banach spaces

Faster randomized block sparse Kaczmarz by averaging

Stochastic Mirror Descent on Overparameterized Nonlinear Models.

Sparse recovery by reduced variance stochastic approximation

Nonasymptotic Concentration Rates in Cooperative Learning—Part II: Inference on Compact Hypothesis Sets

Nonasymptotic Concentration Rates in Cooperative Learning–Part I: Variational Non-Bayesian Social Learning

Variance reduction on general adaptive stochastic mirror descent

Policy Optimization with Stochastic Mirror Descent

Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses

Sparse Representations of Positive Functions via First- and Second-Order Pseudo-Mirror Descent

Fastest rates for stochastic mirror descent methods

Low-Depth Gradient Measurements Can Improve Convergence in Variational Hybrid Quantum-Classical Algorithms.

On stochastic mirror descent with interacting particles: Convergence properties and variance reduction

Distributed stochastic mirror descent algorithm for resource allocation problem

Inexact stochastic mirror descent for two-stage nonlinear stochastic programs

“Relative Continuity” for Non-Lipschitz Nonsmooth Convex Optimization Using Stochastic (or Deterministic) Mirror Descent