Stochastic Graphical Bandits with Adversarial Corruptions

Shiyin Lu,Lijun Zhang,Guanghui Wang

doi:10.1609/aaai.v35i10.17060

Shiyin Lu, Lijun Zhang + Show 1 more

Open Access

https://doi.org/10.1609/aaai.v35i10.17060

Copy DOI

Abstract

We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions. The main idea is to restrict exploration to carefully-designed independent sets of the feedback graph and perform exploitation by adopting a soft version of arm elimination. Theoretical analysis shows that our algorithm attains an $O(\alpha \ln{K} \ln{T} + \alpha C)$ regret, where $\alpha$ is the independence number of the feedback graph, $K$ is the number of arms, $T$ is the time horizon, and $C$ quantifies the total corruptions introduced by the adversary. The effectiveness of our algorithm is demonstrated by numerical experiments.

Full Text