Abstract
In this paper we consider a constrained optimization of discrete time Markov Decision Processes (MDPs) with finite state and action spaces, which accumulate both a reward and costs at each decision epoch. We will study the problem of finding a policy that maximizes the expected total discounted reward subject to the constraints that the expected total discounted costs are not greater than given values. Thus, we will investigate the decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy. The discounted criterion has many applications in several areas such that the Forest Management, the Management of Energy Consumption, the finance, the Communication System (Mobile Networks) and the artificial intelligence.
Highlights
The decomposition method consists in dividing the space of states into subsets which are weakly coupled
We model in this work the environment as a Constrained Markov Decision Processes, defined by a tuple where S is the set of states, A is the set of actions, is the transition probability, is the reward function which denotes immediate reward incurred by taking action in state, is the cost function upper bounded by, of cost constraint, is the discount factor and is the initial fixed state
We will solve the problem of the constrained discounted Markov Decision Processes exploiting the decomposition of the state space into the strongly communicating classes by steps
Summary
The decomposition method consists in dividing the space of states into subsets which are weakly coupled This technique was first introduced by Bather [1]. Following Ross and Varadarajan [5] have presented a similar decomposition method to solve the constrained problem of the long-time average Markov Decision Processes. In this decomposition, the state space is partitioned into Strongly Communicating Classes and a set (perhaps empty) of transient states. We will solve the problem of the constrained discounted Markov Decision Processes exploiting the decomposition of the state space into the strongly communicating classes by steps.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have