Abstract

This paper focuses on the development of an online data-driven model-based inverse reinforcement learning (MBIRL) technique for linear and nonlinear deterministic systems. Input and output trajectories of an agent under observation, attempting to optimize an unknown reward function, are used to estimate the reward function and the corresponding unknown optimal value function, online and in real-time. To achieve MBIRL using limited data, a novel feedback-driven approach to MBIRL is developed. The feedback policy and the dynamic model of the agent under observation are estimated from the measured data and the estimates are used to generate synthetic data to drive MBIRL. Theoretical guarantees for ultimate boundedness of the estimation errors in general, and convergence of the estimation errors to zero in special cases, are derived using Lyapunov techniques. Proof of concept numerical experiments demonstrates the utility of the developed method to solve linear and nonlinear inverse reinforcement learning problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.