Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

Yuanheng Zhu,Junhong Ji,Dongbin Zhao,Haibo He

doi:10.1007/s12559-015-9350-z

Abstract

Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

Abstract

Talk to us

Similar Papers

More From: Cognitive Computation

Lead the way for us

Journal: Cognitive Computation	Publication Date: Aug 25, 2015
Citations: 5

Similar Papers

Error Bounds of Adaptive Dynamic Programming Algorithms
Derong Liu ... Ding Wang
-
Derong Liu, et. al.Derong Liu ... Ding Wang
01 Jan 2017
01 Jan 2017

Least Square Policy Iteration in Reinforcement Learning
Bin Zhao ... Ying Hong
-
Bin Zhao, et. al.Bin Zhao ... Ying Hong
01 Jan 2015
01 Jan 2015

Approximate Policy Iteration for Cooperative Control of Multiagent Systems Under Limited Sensing/Communication
Jing Wang ... Tianyu Yang
-
Jing Wang, et. al.Jing Wang ... Tianyu Yang
01 Jan 2015
01 Jan 2015

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces
Jun Ma ... Warren B Powell
-
Jun Ma, et. al.Jun Ma ... Warren B Powell
01 Mar 2009
01 Mar 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

Abstract

Talk to us

Similar Papers

More From: Cognitive Computation