Abstract

In this paper, we present an online reinforcement learning-based solution to the optimal control problem of continuous-time nonlinear input-affine systems. The proposed approach contains a concurrent identifier that estimates time derivatives of states of the system in some arbitrary points. The identifier is utilized to simulate a so-called Bellman error in some unvisited points. The simulated errors together with errors obtained along the trajectory of the system are used to estimate the state-action value function, which is then employed to derive the estimated optimal controller. The designed approach does not explicitly require the input dynamics, which is hard to segregate it from the drift dynamics in optimal regulation problems. In addition, the simulated Bellman errors relax the confining persistence of excitation condition, which is needed for convergence in deterministic systems. A Lyapunov-based analysis was conducted to derive convergence conditions. Simulation studies demonstrated the effectiveness of the developed control scheme.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call