Off-policy inverse Q-learning for discrete-time antagonistic unknown systems

Bosen Lian,Wenqian Xue,Yijing Xie,Frank L Lewis,Ali Davoudi

doi:10.1016/j.automatica.2023.111171

Bosen Lian, Wenqian Xue + Show 3 more

Open Access

https://doi.org/10.1016/j.automatica.2023.111171

Copy DOI

Journal: Automatica	Publication Date: Jul 8, 2023
Citations: 5	License type: publisher-specific-oa

Affiliation: The University of Texas at Arlington, Northeastern University

Abstract

This paper proposes a data-driven model-free inverse reinforcement learning (RL) algorithm to reconstruct the unknown cost function of the demonstrated discrete-time (DT) dynamical systems with antagonistic disturbances. We propose an inverse RL policy iteration scheme that uses system dynamics and the input policies, for deriving our main result of a data-driven off-policy inverse Q-learning algorithm using only demonstrated trajectories of the antagonistic system without knowing system dynamics and the control policy gain. This data-driven algorithm consists of Q-function evaluation, state-penalty weight improvement, and action policies update. We guarantee unbiased estimates in the data-driven algorithm when exploration noises exist for the persistence of the excitation. An example verifies the proposed algorithm.

Full Text