Abstract
In this paper, we devise inverse reinforcement learning (RL) algorithms for nonlinear continuous-time systems described by multiplayer differential equations. We define a new class of Multi-player Noncooperative Apprentice Games, in which both the expert and the learner have N-player control inputs. The games are solved by the learners reconstructing unknown performance reward functions of the experts from the experts’ trajectories, i.e., states and optimal control inputs. We first develop a model-based inverse RL algorithm that involves two learning stages: an optimal control learning stage and a second inverse optimal control (IOC) learning stage. Our algorithm solves IOC as a subproblem. We therefore provide one possible unified framework for inverse RL and IOC in multiplayer differential dynamic systems. We then develop two inverse RL algorithms using neural networks: completely model-free for homogeneous control inputs; and partially model-free for heterogeneous control inputs. Finally, we present the results of simulations, which verify the validity of our proposed algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.