Generalized Second-Order Value Iteration in Markov Decision Processes

Chandramouli Kamanchi,Raghuram Bharadwaj Diddigi,Shalabh Bhatnagar

doi:10.1109/tac.2021.3112851

Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi + Show 1 more

Open Access

https://doi.org/10.1109/tac.2021.3112851

Copy DOI

Abstract

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov decision process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first-order method and, therefore, it may take a large number of iterations to converge to the optimal solution. Successive relaxation is a popular technique that can be applied to solve a fixed point equation. It has been shown in the literature that under a special structure of the MDP, successive overrelaxation technique computes the optimal value function faster than standard value iteration. In this article, we propose a second-order value iteration procedure that is obtained by applying the Newton–Raphson method to the successive relaxation value iteration scheme. We prove the global convergence of our algorithm to the optimal solution asymptotically and show the second-order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generalized Second-Order Value Iteration in Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control

Lead the way for us

Journal: IEEE Transactions on Automatic Control	Publication Date: Aug 1, 2022
Citations: 1

Similar Papers

Successive Over-Relaxation ${Q}$ -Learning
Chandramouli Kamanchi ... Raghuram Bharadwaj Diddigi
IEEE Control Systems Letters | VOL. 4
Chandramouli Kamanchi, et. al.Chandramouli Kamanchi ... Raghuram Bharadwaj Diddigi
01 Jan 2020
IEEE Control Systems Letters | VOL. 4

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
Mohammad Gheshlaghi Azar ... Hilbert J Kappen
Machine Learning | VOL. 91
Mohammad Gheshlaghi Azar, et. al.Mohammad Gheshlaghi Azar ... Hilbert J Kappen
14 May 2013
Machine Learning | VOL. 91

On the convergence of techniques that improve value iteration
Marek Grzes ... Jesse Hoey
-
Marek Grzes, et. al.Marek Grzes ... Jesse Hoey
01 Aug 2013
01 Aug 2013

Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations
H Blok ... F M Spieksma
Advances in Applied Probability | VOL. 47
H Blok, et. al.H Blok ... F M Spieksma
01 Dec 2015
Advances in Applied Probability | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generalized Second-Order Value Iteration in Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control