Abstract

Optimal control theory deals with finding the policy that minimizes the discounted infinite horizon quadratic cost function. For finding the optimal control policy, the solution of the Hamilton-Jacobi-Bellman (HJB) equation must be found i.e. the value function which satisfies the Bellman equation. However, the HJB is a partial differential equation that is difficult to solve for a nonlinear system. The paper employs the approximate dynamic programming method to solve the HJB equation for the deterministic nonlinear discrete-time systems in continuous state and action space. The approximate solution of the HJB is found by the policy iteration algorithm which has the framework of actor-critic architecture. The control policy and value function are approximated using function approximators such as neural network represented in the form of linearly independent basis function. The gradient descent optimization algorithm is employed to tune the weights of the actor and critic network. The control algorithm is implemented for cart pole inverted pendulum system, the effectiveness of this approach is provided in simulations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.