Abstract

The stochastic multi-armed bandit is a classical reinforcement learning model, where a learning agent sequentially chooses an action (pull a bandit arm) and the environment responds with a stochastic reward drawn from an unknown distribution associated with the chosen action. A popular objective for the agent is to identify the arm having the maximum expected reward, also known as the best arm identification problem. We address the security concerns that occur in a cross-silo federated learning setting, where multiple data owners collaborate under the orchestration of a server to execute a best arm identification algorithm. We propose three secure protocols, which guarantee desirable security properties for the: input data (i.e., reward values), intermediate data (i.e., sums of rewards), and output data (i.e., ranking of arms and in particular the identified best arm). More precisely: (1) no data owner can learn the identified best arm; moreover, no data owner can learn local data pertaining to another data owner; (2) the orchestration participants cannot learn the identified best arm, any reward value, or any sum of rewards; (3) by analyzing the messages exchanged over the network, an external observer cannot learn the identified best arm, or any reward value, or any sum of rewards. Each protocol has a different architecture, uses different techniques, and proposes a different trade-off with respect to several criteria that we thoroughly analyze: number of participants, generality of the supported reward functions, cryptographic overhead, and communication cost. To build our protocols, we rely on secure multi-party computation, AES-CBC, and the additive homomorphic property of Paillier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call