On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning

D P De Farias,B Van Roy

doi:10.1023/a:1004641123405

Abstract

Approximate value iteration is a simple algorithm that combats the curse of dimensionality in dynamic programs by approximating iterates of the classical value iteration algorithm in a spirit reminiscent of statistical regression. Each iteration of this algorithm can be viewed as an application of a modified dynamic programming operator to the current iterate. The hope is that the iterates converge to a fixed point of this operator, which will then serve as a useful approximation of the optimal value function. In this paper, we show that, in general, the modified dynamic programming operator need not possess a fixed point; therefore, approximate value iteration should not be expected to converge. We then propose a variant of approximate value iteration for which the associated operator is guaranteed to possess at least one fixed point. This variant is motivated by studies of temporal-difference (TD) learning, and existence of fixed points implies here existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates exploration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning

Abstract

Talk to us

Similar Papers

More From: Journal of Optimization Theory and Applications

Lead the way for us

Journal: Journal of Optimization Theory and Applications	Publication Date: Jun 1, 2000
Citations: 69

Similar Papers

Approximate value iteration with randomized policies
D.P De Farias ... B Van Roy
-
D.P De Farias, et. al.D.P De Farias ... B Van Roy
12 Dec 2000
12 Dec 2000

Approximate dynamic programming via direct search in the space of value function approximations
E.F Arruda ... J.B.R Do Val
European Journal of Operational Research | VOL. 211
E.F Arruda, et. al.E.F Arruda ... J.B.R Do Val
13 Jan 2011
European Journal of Operational Research | VOL. 211

Approximate Dynamic Programming
Warren B Powell
-
Warren B PowellWarren B Powell
04 Aug 2011
04 Aug 2011

Data-driven optimal stabilization for discrete-time nonlinear systems by approximate value iteration
Yongqiang Li ... Yuanjing Feng
-
Yongqiang Li, et. al.Yongqiang Li ... Yuanjing Feng
01 Dec 2014
01 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning

Abstract

Talk to us

Similar Papers

More From: Journal of Optimization Theory and Applications