Abstract

This paper proposes a data-driven model-free inverse reinforcement learning (RL) algorithm to reconstruct the unknown cost function of the demonstrated discrete-time (DT) dynamical systems with antagonistic disturbances. We propose an inverse RL policy iteration scheme that uses system dynamics and the input policies, for deriving our main result of a data-driven off-policy inverse Q-learning algorithm using only demonstrated trajectories of the antagonistic system without knowing system dynamics and the control policy gain. This data-driven algorithm consists of Q-function evaluation, state-penalty weight improvement, and action policies update. We guarantee unbiased estimates in the data-driven algorithm when exploration noises exist for the persistence of the excitation. An example verifies the proposed algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call