Abstract

As classical methods are intractable for solving Markov decision processes (MDPs) requiring a large state space, decomposition and aggregation techniques are very useful to cope with large problems. These techniques are in general a special case of the classic Divide-and-Conquer framework to split a large, unwieldy problem into smaller components and solving the parts in order to construct the global solution. This paper reviews most of decomposition approaches encountered in the associated literature over the past two decades, weighing their pros and cons. We consider several categories of MDPs (average, discounted, and weighted MDPs), and we present briefly a variety of methodologies to find or approximate optimal strategies.

Highlights

  • This survey focuses on decomposition techniques for solving large MDPs

  • Markov chain models are still the most general class of analytic models widely used for performance and dependability analysis

  • We will summarize two decomposition techniques for tackling this complexity: the first is proposed by Ross and Varadarajan 38 and the second is introduced by Abbad and Boustique 36

Read more

Summary

Large Markov Chain Models

Markov chain models are still the most general class of analytic models widely used for performance and dependability analysis. Lumping is only valid for models that have certain very limited types of structure Another method to treat large state spaces is aggregation/disaggregation 3. The solution method is generally an iterative one in which submodels are solved and the obtained results are used to adjust the submodels repetitively until a convergence criterion is met This is an efficient procedure if a the model is decomposable into tightly coupled submodels, b the state space of each submodel is practically solvable, and c the number of such submodels is not too large. The state space is extremely large, the stationary state probability distribution is highly skewed, that is, only a small subset of the states account for the vast majority of the probability mass This is illustrated by considering the nature of several modeling application areas. They present some methods to compute bounds on performance measures

Large Markov Decision Models
Markov Decision Processes
Some Decomposition Techniques for MDPs
Ross-Varadarajan Decomposition
Deterministic MDPs
The Aggregated MDP and Cycles
Abbad-Boustique Decomposition
Construction of the Restricted MDPs in Level L0
Hierarchical Reinforcement Learning
Factored Approaches
Hierarchical Approaches
Dantzig-Wolfe Decomposition
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call