Abstract

We integrate two numerical procedures for solving the average reward Markov decision process (MDP), standard successive approximations and modified policy iteration with reward revision. Reward revision is the process of revising the reward structure of a second, more computationally desirable MDP so as to produce, in the limit, an optimality equation having a fixed point identical to that associated with the original MDP. A numerical study indicates that for MDP's having a non-sparse transition structure with a small number of relatively large entries per row, the addition of reward revision can have significant computational benefits.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call