Inverse reinforcement learning for multi-player noncooperative apprentice games

Bosen Lian,Wenqian Xue,Frank L Lewis,Tianyou Chai

doi:10.1016/j.automatica.2022.110524

Bosen Lian, Wenqian Xue + Show 2 more

Open Access

https://doi.org/10.1016/j.automatica.2022.110524

Copy DOI

Journal: Automatica	Publication Date: Aug 11, 2022
Citations: 14	License type: publisher-specific-oa

Affiliation: Northeastern University, The University of Texas at Arlington

Abstract

In this paper, we devise inverse reinforcement learning (RL) algorithms for nonlinear continuous-time systems described by multiplayer differential equations. We define a new class of Multi-player Noncooperative Apprentice Games, in which both the expert and the learner have N-player control inputs. The games are solved by the learners reconstructing unknown performance reward functions of the experts from the experts’ trajectories, i.e., states and optimal control inputs. We first develop a model-based inverse RL algorithm that involves two learning stages: an optimal control learning stage and a second inverse optimal control (IOC) learning stage. Our algorithm solves IOC as a subproblem. We therefore provide one possible unified framework for inverse RL and IOC in multiplayer differential dynamic systems. We then develop two inverse RL algorithms using neural networks: completely model-free for homogeneous control inputs; and partially model-free for heterogeneous control inputs. Finally, we present the results of simulations, which verify the validity of our proposed algorithms.

Full Text