Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback.

Jueyou Li,Zhiyou Wu,Chuanye Gu,Tingwen Huang

doi:10.1109/tcyb.2020.2990796

Abstract

This article focuses on multiagent distributed-constrained optimization problems in a dynamic environment, in which a group of agents aims to cooperatively optimize a sum of time-changing local cost functions subject to time-varying coupled constraints. Both the local cost functions and constraint functions are unrevealed to an individual agent until an action is submitted. We first investigate a gradient-feedback scenario, where each agent can access both values and gradients of cost functions and constraint functions owned by itself at the chosen action. Then, we design a distributed primal-dual online learning algorithm and show that the proposed algorithm can achieve the sublinear bounds for both the regret and constraint violations. Furthermore, we extend the gradient-feedback algorithm to a gradient-free setup, where an individual agent has only attained the values of local cost functions and constraint functions at two queried points near the selected action. We develop a bandit version of the previous method and give the explicitly sublinear bounds on the expected regret and expected constraint violations. The results indicate that the bandit algorithm can achieve almost the same performance as the gradient-feedback algorithm under wild conditions. Finally, numerical simulations on an electric vehicle charging problem demonstrate the effectiveness of the proposed algorithms.

Full Text