Adversarial Attacks on Computation of the Modified Policy Iteration Method

Ali Yekkehkhany,Javad Lavaei,Han Feng

doi:10.1109/cdc45484.2021.9683559

Abstract

Adversarial attacks on Markov decision processes (MDPs) and reinforcement learning (RL) have been studied in the literature in the context of robust learning and adversarial game theory. In this paper, we introduce a new notion of adversarial attacks on MDP and RL computation that is motivated by the emergence of edge computing. The large-scale computation of MDP and RL models in the form of value/policy iteration and Q-learning is being offloaded from agents to distributed servers, giving rise to edge reinforcement learning. By the inherently distributed nature of edge RL, the MDP/RL computation can be prone to adversarial attacks in different forms. We analyze a probabilistic model of adversarial attacks on the computation of the modified policy iteration method in which the principal contraction property of the Bellman operator is undermined with a certain probability in iterations of the policy evaluation step of the aforementioned method. This can result in luring the agent to search among suboptimal policies without improving the true values of policies. We prove that under certain conditions, the attacked modified policy iteration method can still converge to the vicinity of the optimal policy with high probability if the number of policy evaluation iterations is larger than a threshold that is logarithmic in the inverse of a desired precision. We also provide an upper bound on the number of iterations needed for the attacked modified policy iteration method to terminate, which holds with an associated confidence level.

Full Text