Learning adversarial attack policies through multi-objective reinforcement learning

Javier García,Rubén Majadas,Fernando Fernández

doi:10.1016/j.engappai.2020.104021

Javier García, Rubén Majadas + Show 1 more

https://doi.org/10.1016/j.engappai.2020.104021

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Deep Reinforcement Learning has shown promising results in learning policies for complex sequential decision-making tasks. However, different adversarial attack strategies have revealed the weakness of these policies to perturbations to their observations. Most of these attacks have been built on existing adversarial example crafting techniques used to fool classifiers, where an adversarial attack is considered a success if it makes the classifier outputs any wrong class. The major drawback of these approaches when applied to decision-making tasks is that they are blind for long-term goals. In contrast, this paper suggests that it is more appropriate to view the attack process as a sequential optimization problem, with the aim of learning a sequence of attacks, where the attacker must consider the long-term effects of each attack. In this paper, we propose that such an attack policy must be learned with two objectives in view. On the one hand, the attack must pursue the maximum performance loss of the attacked policy. On the other hand, it also should minimize the cost of the attacks. Therefore, in this paper we propose a novel modelization of the process of learning an attack policy as a Multi-objective Markov Decision Process with two objectives: maximizing the performance loss of the attacked policy and minimizing the cost of the attacks. We also reveal the conflicting nature of these two objectives and use a Multi-objective Reinforcement Learning algorithm to draw the Pareto fronts for four well-known tasks: the GridWorld, the Cartpole, the Mountain car and the Breakout.

Full Text