Abstract

This paper proposes a data-driven model-free inverse reinforcement learning (RL) algorithm to reconstruct the unknown cost function of the demonstrated discrete-time (DT) dynamical systems with antagonistic disturbances. We propose an inverse RL policy iteration scheme that uses system dynamics and the input policies, for deriving our main result of a data-driven off-policy inverse Q-learning algorithm using only demonstrated trajectories of the antagonistic system without knowing system dynamics and the control policy gain. This data-driven algorithm consists of Q-function evaluation, state-penalty weight improvement, and action policies update. We guarantee unbiased estimates in the data-driven algorithm when exploration noises exist for the persistence of the excitation. An example verifies the proposed algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.