Abstract

SummaryRecently proposed adaptive dynamic programming (ADP) tracking controllers assume that the reference trajectory follows time‐invariant exo‐system dynamics—an assumption that does not hold for many applications. In order to overcome this limitation, we propose a new Q‐function that explicitly incorporates a parametrized approximation of the reference trajectory. This allows learning to track a general class of trajectories by means of ADP. Once our Q‐function has been learned, the associated controller handles time‐varying reference trajectories without the need for further training and independent of exo‐system dynamics. After proposing this general model‐free off‐policy tracking method, we provide an analysis of the important special case of linear quadratic tracking. An example demonstrates that our new method successfully learns the optimal tracking controller and outperforms existing approaches in terms of tracking error and cost.

Highlights

  • Adaptive and iterative learning controllers are a powerful tool in case of unknown or partially unknown system dynamics[1,2,3,4,5] or in multiagent coordination problems.[6]

  • Summary Recently proposed adaptive dynamic programming (ADP) tracking controllers assume that the reference trajectory follows time-invariant exo-system dynamics—an assumption that does not hold for many applications

  • In order to validate our proposed parametrized reference ADP (PRADP) tracking method, we show simulation results where the reference trajectory is parametrized by means of cubic polynomials.* we compare the results with an ADP tracking method that assumes that the reference can be described by a time-invariant exo-system f ref(rk)

Read more

Summary

Introduction

Adaptive and iterative learning controllers are a powerful tool in case of unknown or partially unknown system dynamics[1,2,3,4,5] or in multiagent coordination problems.[6] For the data-based tuning of optimal controllers, where the objective is to minimize a cost functional, adaptive dynamic programming (ADP), which is a method of reinforcement learning, has recently gained extensive attention.[7] In ADP, the controller adapts its behavior based on its interaction with an unknown system and the associated cost signals.[8]. The aim is to track a desired reference trajectory optimally w.r.t. a given objective function for a system with unknown dynamics and where no explicit system model is used (ie, the model-free setting is considered). The objective function quantifies the control objectives and typically penalizes the control effort and/or the deviation of the system state from the desired trajectory. Examples that require the tracking of flexible and time-varying trajectories are the longitudinal

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.