Least Square Policy Iteration in Reinforcement Learning

Bin Zhao,Haifei Zhang,Hailong Deng,Ying Hong

doi:10.2991/lemcs-15.2015.272

Abstract

Policy iteration is the core procedure for solving problems of reinforcement learning method. Policy iteration evaluates polices by evaluating value functions of these polices and then new improvement polices will be figured out by these value functions. Value functions and polices in classic policy iteration are tabular and accurate. However, these are not suitable for problems in extensive and continuous, i.e. action space reinforcement learning. Therefore, approximate policy iteration is often used to solving the problems. It constructs approximate value function for present policy and becomes an important part of approximate policy iteration. Policy is expressed by instantly calculating policy action from approximate function rather than explicit expression. Least square reinforcement method is sample-effective in solving parameters approximating the value function, the larger the sample size, the faster the speed of approaching solution. This paper will discuss the online least square policy iteration algorithms in reinforcement learning. KeywordsPolicy iteration; Least Square; Reinforcement learning; Sample-effective; Policy improvement

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Least Square Policy Iteration in Reinforcement Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2015
Citations: 7	License type: cc-by-nc

Similar Papers

Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics.
Ruizhuo Song ... Frank L Lewis
IEEE Transactions on Cybernetics | VOL. 51
Ruizhuo Song, et. al.Ruizhuo Song ... Frank L Lewis
18 May 2021
IEEE Transactions on Cybernetics | VOL. 51

Error Bounds of Adaptive Dynamic Programming Algorithms
Derong Liu ... Ding Wang
-
Derong Liu, et. al.Derong Liu ... Ding Wang
01 Jan 2017
01 Jan 2017

An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction
Dazi Li ... Yuting Wang
IEEE Access | VOL. 6
Dazi Li, et. al.Dazi Li ... Yuting Wang
01 Jan 2018
IEEE Access | VOL. 6

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces
Jun Ma ... Warren B Powell
-
Jun Ma, et. al.Jun Ma ... Warren B Powell
01 Mar 2009
01 Mar 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Least Square Policy Iteration in Reinforcement Learning

Abstract

Talk to us

Similar Papers