Abstract

In this paper, we devise inverse reinforcement learning (RL) algorithms for nonlinear continuous-time systems described by multiplayer differential equations. We define a new class of Multi-player Noncooperative Apprentice Games, in which both the expert and the learner have N-player control inputs. The games are solved by the learners reconstructing unknown performance reward functions of the experts from the experts’ trajectories, i.e., states and optimal control inputs. We first develop a model-based inverse RL algorithm that involves two learning stages: an optimal control learning stage and a second inverse optimal control (IOC) learning stage. Our algorithm solves IOC as a subproblem. We therefore provide one possible unified framework for inverse RL and IOC in multiplayer differential dynamic systems. We then develop two inverse RL algorithms using neural networks: completely model-free for homogeneous control inputs; and partially model-free for heterogeneous control inputs. Finally, we present the results of simulations, which verify the validity of our proposed algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call