Abstract

This article proposes new inverse reinforcement learning (RL) algorithms to solve our defined Adversarial Apprentice Games for nonlinear learner and expert systems. The games are solved by extracting the unknown cost function of an expert by a learner using demonstrated expert's behaviors. We first develop a model-based inverse RL algorithm that consists of two learning stages: an optimal control learning and a second learning based on inverse optimal control. This algorithm also clarifies the relationships between inverse RL and inverse optimal control. Then, we propose a new model-free integral inverse RL algorithm to reconstruct the unknown expert cost function. The model-free algorithm only needs online demonstration of the expert and learner's trajectory data without knowing system dynamics of either the learner or the expert. These two algorithms are further implemented using neural networks (NNs). In Adversarial Apprentice Games, the learner and the expert are allowed to suffer from different adversarial attacks in the learning process. A two-player zero-sum game is formulated for each of these two agents and is solved as a subproblem for the learner in inverse RL. Furthermore, it is shown that the cost functions that the learner learns to mimic the expert's behavior are stabilizing and not unique. Finally, simulations and comparisons show the effectiveness and the superiority of the proposed algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call