Bandit Feedback Research Articles

Decentralized optimization methods often entail information exchange between neighbors. In many circumstances, due to the limited communication bandwidth, the exchanged information has to be quantized. In this paper, we investigate the impact of quantization on the performance of decentralized stochastic optimization. We consider a multi-agent network, in which each node is associated with a stochastic cost and each pair of neighbors is associated with a constraint. The goal of the network is to minimize the aggregate expected cost subject to all the constraints. We first develop a decentralized stochastic saddle point algorithm with quantized communications for the scenario of sample feedback, in which a sample of the random variable in the stochastic cost function is revealed to each node at each time. We establish performance bounds for the expected cost suboptimality and expected constraint violations of the algorithm in terms of the quantization intervals. We show that asymptotic optimality can be achieved if the quantization intervals approach zero. On the other hand, if the quantization intervals are fixed and nonzero, then non-diminishing performance gaps exist, which indicate a tradeoff between optimization performance and communication overhead. We further extend the algorithm and analysis to the scenario of bandit feedback, where samples of the random variables are not available and only the values of the cost function at two random points close to the current decision are disclosed. We show that the performance of the algorithm does not degrade in order sense compared to its counterpart with sample feedback. Finally, two numerical examples, namely decentralized quadratically constrained quadratic program and decentralized logistic regression, are presented to corroborate the efficacy of the proposed algorithms.

Read full abstract

In this article, we consider distributed online convex optimization problems, where the distributed system consists of various computing units connected through a time-varying communication graph. In each time step, each computing unit selects a constrained vector, experiences a loss equal to an arbitrary convex function evaluated at this vector, and may communicate to its neighbors in the graph. The objective is to minimize the system-wide loss accumulated over time. We propose a decentralized algorithm with regret and cumulative constraint violation in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\mathcal O}(T^{\max \lbrace c,1-c\rbrace })$</tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\mathcal O}(T^{1-c/2})$</tex-math></inline-formula> , respectively, for any <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$c\in (0,1)$</tex-math></inline-formula> , where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$T$</tex-math></inline-formula> is the time horizon. When the loss functions are strongly convex, we establish improved regret and constraint violation upper bounds in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\mathcal O}(\log (T))$</tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\mathcal O}(\sqrt{T\log (T)})$</tex-math></inline-formula> . These regret scalings match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problem (for both convex and strongly convex loss functions). In the case of bandit feedback, the proposed algorithms achieve a regret and constraint violation in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\mathcal O}(T^{\max \lbrace c,1-c/3 \rbrace })$</tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\mathcal O}(T^{1-c/2})$</tex-math></inline-formula> for any <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$c\in (0,1)$</tex-math></inline-formula> . We numerically illustrate the performance of our algorithms for the particular case of distributed online regularized linear regression problems on synthetic and real data.

Read full abstract

Bandit Feedback Research Articles

Articles published on Bandit Feedback

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Policy Optimization as Online Learning with Mediator Feedback

A note on the price of bandit feedback for mistake-bounded online learning

Decentralized online convex optimization based on signs of relative states

Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

Decentralized Online Convex Optimization With Event-Triggered Communications

Distributed Online Linear Regressions

Push-Sum Distributed Online Optimization With Bandit Feedback.

Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback.

Distributed Mirror Descent for Online Composite Optimization

On Adaptivity in Information-Constrained Online Learning

Tuning of multivariable model predictive controllers through expert bandit feedback

Decentralized Multi-Agent Stochastic Optimization With Pairwise Constraints and Quantized Communications

Distributed Online Optimization With Long-Term Constraints

Learning to Control Renewal Processes with Bandit Feedback

A Bayesian Optimization Approach to Compute Nash Equilibrium of Potential Games Using Bandit Feedback

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

Efficient Counterfactual Learning from Bandit Feedback

Online Convex Optimization With Time-Varying Constraints and Bandit Feedback

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bandit Feedback Research Articles

Articles published on Bandit Feedback

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Policy Optimization as Online Learning with Mediator Feedback

A note on the price of bandit feedback for mistake-bounded online learning

Decentralized online convex optimization based on signs of relative states

Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

Decentralized Online Convex Optimization With Event-Triggered Communications

Distributed Online Linear Regressions

Push-Sum Distributed Online Optimization With Bandit Feedback.

Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback.

Distributed Mirror Descent for Online Composite Optimization

On Adaptivity in Information-Constrained Online Learning

Tuning of multivariable model predictive controllers through expert bandit feedback

Decentralized Multi-Agent Stochastic Optimization With Pairwise Constraints and Quantized Communications

Distributed Online Optimization With Long-Term Constraints

Learning to Control Renewal Processes with Bandit Feedback

A Bayesian Optimization Approach to Compute Nash Equilibrium of Potential Games Using Bandit Feedback

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

Efficient Counterfactual Learning from Bandit Feedback

Online Convex Optimization With Time-Varying Constraints and Bandit Feedback