Abstract

Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.