Gradient-based policy iteration: an example

Xi-Ren Cao Xi-Ren Cao,Hai-Tao Fang Hai-Tao Fang

doi:10.1109/cdc.2002.1184395

Gradient-based policy iteration: an example

Xi-Ren Cao Xi-Ren Cao, Hai-Tao Fang Hai-Tao Fang

https://doi.org/10.1109/cdc.2002.1184395

Copy DOI

Publication Date: Dec 10, 2002

Citations: 10

Affiliation: Hong Kong University of Science and Technology, University of Hong Kong, Academy of Mathematics and Systems Science

#Policy Iteration #Standard Policy Iteration + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. We propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of an M/G/1/N queue and identify some further research topics.

Full Text