Abstract
In this chapter, we first establish error bounds of adaptive dynamic programming (ADP) algorithms for solving undiscounted infinite-horizon optimal control problems of discrete-time deterministic nonlinear dynamical systems. We establish the error bounds for approximate value iteration, approximate policy iteration, and approximate optimistic policy iteration algorithms based on a new error condition. It is shown that the iterative approximate value function converges to a finite neighborhood of the optimal value function under some mild conditions. In addition, we also establish the error bound for Q-function of approximate policy iteration for optimal control of unknown discounted discrete-time nonlinear dynamical systems. We develop an iterative ADP algorithm by using Q-function which depends on the state and action to solve the nonlinear optimal control problems. Function approximation structures such as neural networks are used to approximate the Q-function and the control policy. These results provide theoretical guarantees for using neural network approximation to solve optimal control problems for nonlinear dynamical systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have