Least Squares Policy Evaluation Algorithms with Linear Function Approximation

A Nedić,D P Bertsekas

doi:10.1023/a:1022192903948

Abstract

We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the λ-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(λ) algorithm recently proposed by Boyan, which for λ=0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(λ), with probability 1, for every λ ∈ [0, 1].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Abstract

Talk to us

Similar Papers

More From: Discrete Event Dynamic Systems

Lead the way for us

Journal: Discrete Event Dynamic Systems	Publication Date: Jan 1, 2003
Citations: 184

Similar Papers

Improved Temporal Difference Methods with Linear Function Approximation
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

Accelerated gradient temporal difference learning algorithms
Dominik Meyer ... Remy Degenne
-
Dominik Meyer, et. al.Dominik Meyer ... Remy Degenne
01 Dec 2014
01 Dec 2014

Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies
Milos S Stankovic ... Srdjan S Stankovic
-
Milos S Stankovic, et. al.Milos S Stankovic ... Srdjan S Stankovic
01 Jul 2016
01 Jul 2016

Best linear approximation and correlation immunity of functions over Z/sub m/*
Jinjun Zhou ... Fengxiu Gao
IEEE Transactions on Information Theory | VOL. 45
Jinjun Zhou, et. al. Jinjun Zhou ... Fengxiu Gao
01 Jan 1998
IEEE Transactions on Information Theory | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Abstract

Talk to us

Similar Papers

More From: Discrete Event Dynamic Systems