Online bandit convex optimisation with stochastic constraints via two-point feedback

Guo Chen,Jichi Yu,Jueyou Li

doi:10.1080/00207721.2023.2209566

Abstract

ABSTRACT In this paper, an online convex optimisation problem with stochastic constraints in the bandit setup is investigated. We are particularly interested in the scenario where the gradient information of both loss and constraint functions is unavailable. Under this scenario, only the values of loss and constraint functions at a few random points near the decision are provided to the decision maker after the decision is submitted. We first propose an online bandit algorithm based on the virtual queue in which two-point feedback is used to approximate the gradient feedback. Then we adopt the static benchmark to analyse the optimisation performance and establish the sub-linear expected static regret and sub-linear expected constraint violations of the proposed algorithm in the two-point bandit feedback setup. Moreover, the expected static regret and constraint violations are further improved to when loss functions satisfy the condition of strong convexity. Finally, an online job scheduling numerical simulation is shown to demonstrate the performance of the proposed method and to corroborate the theoretical guarantees.

Full Text