Abstract

This chapter presented a new policy iteration technique that solves the continuous time LQR problem online without using knowledge about the system's internal dynamics (system matrix A). The algorithm was derived by writing the value function in integral reinforcement form to yield a new form of Bellman equation for CT systems. This allows the derivation of an integral reinforcement learning (IRL) algorithm, which is an adaptive controller that converges online to the solution of the optimal LQR controller. IRL is based on an adaptive critic scheme in which the actor performs continuous-time control while the critic incrementally corrects the actor's behavior at discrete moments in time until best performance is obtained. The critic evaluates the actor performance over a period of time and formulates it in a parameterized form. Based on the critic's evaluation the actor behavior policy is updated for improved control performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call