Q-learning and enhanced policy iteration in discounted dynamic programming

Dimitri P Bertsekas,Huizhen Yu

doi:10.1109/cdc.2010.5717930

Abstract

We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm involves (possibly inexact) solution of an optimal stopping problem. This problem can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modified policy iteration, with lower overhead advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm resolves effectively the inherent difficulties of existing schemes due to inadequate exploration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Q-learning and enhanced policy iteration in discounted dynamic programming

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Dec 1, 2010
Citations: 59	License type: cc-by-nc-sa

Similar Papers

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Dimitri P Bertsekas ... Huizhen Yu
Mathematics of Operations Research | VOL. 37
Dimitri P Bertsekas, et. al.Dimitri P Bertsekas ... Huizhen Yu
01 Feb 2012
Mathematics of Operations Research | VOL. 37

MapReduce for Parallel Reinforcement Learning
Yuxi Li ... Dale Schuurmans
-
Yuxi Li, et. al.Yuxi Li ... Dale Schuurmans
01 Jan 2012
01 Jan 2012

Q-learning and policy iteration algorithms for stochastic shortest path problems
Huizhen Yu ... Dimitri P Bertsekas
Annals of Operations Research | VOL. 208
Huizhen Yu, et. al.Huizhen Yu ... Dimitri P Bertsekas
18 Apr 2012
Annals of Operations Research | VOL. 208

Least Squares Policy Evaluation Algorithms with Linear Function Approximation
A Nedić ... D P Bertsekas
Discrete Event Dynamic Systems | VOL. 13
A Nedić, et. al.A Nedić ... D P Bertsekas
01 Jan 2003
Discrete Event Dynamic Systems | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Q-learning and enhanced policy iteration in discounted dynamic programming

Abstract

Talk to us

Similar Papers