Federated Bandit: A Gossiping Approach

Zhaowei Zhu,Ji Liu,Yang Liu,Jingxuan Zhu

doi:10.1145/3410220.3453919

Abstract

In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of \textttpoly (N,M) log T, \textttpoly (N,M)log_lambda_2^-1 N ) for all agents, where lambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose \textttFed\_UCB, a differentially private version of \textttGossip\_UCB, in which the agents preserve e-differential privacy of their local data while achieving O(\max \\frac\textttpoly (N,M) e log^2.5 T, \textttpoly (N,M) (log_lambda_2^-1 + log T) ) regret.

Full Text