Abstract

We consider discrete time Markov Decision Process (MDP) with finite state and action spaces under average reward optimality criterion. The decomposition theory, in Ross and Varadarajan [11], leads to a natural partition of the state space into strongly communicating classes and a set of states that are transient under all stationary strategies. Then, an optimal pure strategy can be obtained from an optimal strategy for some smaller aggregated MDP. This decomposition gives an efficient method for solving large-scale MDPs. In this paper, we consider deterministic MDPs and we construct a simple algorithm, based on graph theory, to determine an aggregated optimal policy. In the case of MDPs without cycles, we propose an algorithm for computing aggregated optimal strategies. In the general case, we propose some new improving algorithms for computing aggregated optimal strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call