Abstract

This chapter presents an adaptive method based on actor-critic reinforcement learning (RL) for solving online the optimal control problem for non-linear continuous-time systems in the state space form x(t)=f(x)+g(x)u(t). The algorithm, first presented in Vrabie et al. (2008, 2009), Vrabie (2009), Vrabie and Lewis (2009), solves the optimal control problem without requiring knowledge of the drift dynamics f(x). The method is based on policy iteration (PI), a RL algorithm that iterates between the steps of policy evaluation and policy improvement. The PI method starts by evaluating the cost of a given admissible initial policy and then uses this information to obtain a new control policy, which is improved in the sense of having a smaller associated cost compared with the previous policy. These two steps are repeated until the policy improvement step no longer changes the present policy, indicating that the optimal control behavior has been obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call